commit
8de1aa2082
@ -0,0 +1,173 @@
|
|||||||
|
# Chatter
|
||||||
|
|
||||||
|
Chatter is a tool designed to be a self-hosted chat cog.
|
||||||
|
|
||||||
|
It is based on the brilliant work over at [Chatterbot](https://github.com/gunthercox/ChatterBot) and [spaCy](https://github.com/explosion/spaCy)
|
||||||
|
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
|
||||||
|
* Chatter will not reload
|
||||||
|
* Causes this error:
|
||||||
|
```
|
||||||
|
chatterbot.adapters.Adapter.InvalidAdapterTypeException: chatterbot.storage.SQLStorageAdapter must be a subclass of StorageAdapter
|
||||||
|
```
|
||||||
|
* Chatter responses are slow
|
||||||
|
* This is an unfortunate side-effect to running self-hosted maching learning on a discord bot.
|
||||||
|
* This version includes a number of attempts at improving this, but there is only so much that can be done.
|
||||||
|
* Chatter responses are irrelevant
|
||||||
|
* This can be caused by bad training, but sometimes the data just doesn't come together right.
|
||||||
|
* Asking for better accuracy often leads to slower responses as well, so I've leaned towards speed over accuracy.
|
||||||
|
* Chatter installation is not working
|
||||||
|
* See installation instructions below
|
||||||
|
|
||||||
|
## Warning
|
||||||
|
|
||||||
|
**Chatter is a CPU, RAM, and Disk intensive cog.**
|
||||||
|
|
||||||
|
Chatter by default uses spaCy's `en_core_web_md` training model, which is ~50 MB
|
||||||
|
|
||||||
|
Chatter can potential use spaCy's `en_core_web_lg` training model, which is ~800 MB
|
||||||
|
|
||||||
|
Chatter uses as sqlite database that can potentially take up a large amount os disk space,
|
||||||
|
depending on how much training Chatter has done.
|
||||||
|
|
||||||
|
The sqlite database can be safely deleted at any time. Deletion will only erase training data.
|
||||||
|
|
||||||
|
|
||||||
|
# Installation
|
||||||
|
The installation is currently very tricky on Windows.
|
||||||
|
|
||||||
|
There are a number of reasons for this, but the main ones are as follows:
|
||||||
|
* Using a dev version of chatterbot
|
||||||
|
* Some chatterbot requirements conflict with Red's (as of 3.10)
|
||||||
|
* spaCy version is newer than chatterbot's requirements
|
||||||
|
* A symlink in spacy to map `en` to `en_core_web_sm` requires admin permissions on windows
|
||||||
|
* C++ Build tools are required on Windows for spaCy
|
||||||
|
* Pandoc is required for something on windows, but I can't remember what
|
||||||
|
|
||||||
|
Linux is a bit easier, but only tested on Debian and Ubuntu.
|
||||||
|
|
||||||
|
## Windows Prerequisites
|
||||||
|
|
||||||
|
Install these on your windows machine before attempting the installation
|
||||||
|
|
||||||
|
[Visual Studio C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
|
||||||
|
|
||||||
|
[Pandoc - Universal Document Converter](https://pandoc.org/installing.html)
|
||||||
|
|
||||||
|
## Methods
|
||||||
|
### Windows - Manually
|
||||||
|
#### Step 1: Built-in Downloader
|
||||||
|
|
||||||
|
You need to get a copy of the requirements.txt provided with chatter, I recommend this method.
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]repo add Fox https://github.com/bobloy/Fox-V3
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 2: Install Requirements
|
||||||
|
|
||||||
|
Make sure you have your virtual environment that you installed Red on activated before starting this step. See the Red Docs for details on how.
|
||||||
|
|
||||||
|
In a terminal running as an admin, navigate to the directory containing this repo.
|
||||||
|
|
||||||
|
I've used my install directory as an example.
|
||||||
|
|
||||||
|
```
|
||||||
|
cd C:\Users\Bobloy\AppData\Local\Red-DiscordBot\Red-DiscordBot\data\bobbot\cogs\RepoManager\repos\Fox\chatter
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip install --no-deps "chatterbot>=1.1"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 3: Load Chatter
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]cog install Fox chatter
|
||||||
|
[p]load chatter
|
||||||
|
```
|
||||||
|
|
||||||
|
### Linux - Manually
|
||||||
|
|
||||||
|
#### Step 1: Built-in Downloader
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]cog install Chatter
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 2: Install Requirements
|
||||||
|
|
||||||
|
In your console with your virtual environment activated:
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install --no-deps "chatterbot>=1.1"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Load Chatter
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]load chatter
|
||||||
|
```
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
|
||||||
|
Chatter works out the the box without any training by learning as it goes,
|
||||||
|
but will have very poor and repetitive responses at first.
|
||||||
|
|
||||||
|
Initial training is recommended to speed up its learning.
|
||||||
|
|
||||||
|
## Training Setup
|
||||||
|
|
||||||
|
### Minutes
|
||||||
|
```
|
||||||
|
[p]chatter minutes X
|
||||||
|
```
|
||||||
|
This command configures what Chatter considers the maximum amount of minutes
|
||||||
|
that can pass between statements before considering it a new conversation.
|
||||||
|
|
||||||
|
Servers with lots of activity should set this low, where servers with low activity
|
||||||
|
will want this number to be fairly high.
|
||||||
|
|
||||||
|
This is only used during training.
|
||||||
|
|
||||||
|
### Age
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]chatter age X
|
||||||
|
```
|
||||||
|
This command configures the maximum number of days Chatter will look back when
|
||||||
|
gathering messages for training.
|
||||||
|
|
||||||
|
Setting this to be extremely high is not recommended due to the increased disk space required to store
|
||||||
|
the data. Additionally, higher numbers will increase the training time tremendously.
|
||||||
|
|
||||||
|
|
||||||
|
## Training
|
||||||
|
|
||||||
|
### Train English
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]chatter trainenglish
|
||||||
|
```
|
||||||
|
|
||||||
|
This will train chatter on basic english greetings and conversations.
|
||||||
|
This is far from complete, but can act as a good base point for new installations.
|
||||||
|
|
||||||
|
### Train Channel
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]chatter train #channel_name
|
||||||
|
```
|
||||||
|
This command trains Chatter on the specified channel based on the configured
|
||||||
|
settings. This can take a long time to process.
|
||||||
|
|
||||||
|
|
||||||
|
## Switching Algorithms
|
||||||
|
|
||||||
|
```
|
||||||
|
[p]chatter algorithm X
|
||||||
|
```
|
||||||
|
|
||||||
|
Chatter can be configured to use one of three different Similarity algorithms.
|
||||||
|
|
||||||
|
Changing this can help if the response speed is too slow, but can reduce the accuracy of results.
|
@ -1,13 +0,0 @@
|
|||||||
"""
|
|
||||||
ChatterBot is a machine learning, conversational dialog engine.
|
|
||||||
"""
|
|
||||||
from .chatterbot import ChatBot
|
|
||||||
|
|
||||||
__version__ = '0.8.5'
|
|
||||||
__author__ = 'Gunther Cox'
|
|
||||||
__email__ = 'gunthercx@gmail.com'
|
|
||||||
__url__ = 'https://github.com/gunthercox/ChatterBot'
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'ChatBot',
|
|
||||||
)
|
|
@ -1,22 +0,0 @@
|
|||||||
import sys
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
import importlib
|
|
||||||
|
|
||||||
if '--version' in sys.argv:
|
|
||||||
chatterbot = importlib.import_module('chatterbot')
|
|
||||||
print(chatterbot.__version__)
|
|
||||||
|
|
||||||
if 'list_nltk_data' in sys.argv:
|
|
||||||
import os
|
|
||||||
import nltk.data
|
|
||||||
|
|
||||||
data_directories = []
|
|
||||||
|
|
||||||
# Find each data directory in the NLTK path that has content
|
|
||||||
for path in nltk.data.path:
|
|
||||||
if os.path.exists(path):
|
|
||||||
if os.listdir(path):
|
|
||||||
data_directories.append(path)
|
|
||||||
|
|
||||||
print(os.linesep.join(data_directories))
|
|
@ -1,47 +0,0 @@
|
|||||||
import logging
|
|
||||||
|
|
||||||
|
|
||||||
class Adapter(object):
|
|
||||||
"""
|
|
||||||
A superclass for all adapter classes.
|
|
||||||
|
|
||||||
:param logger: A python logger.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
self.logger = kwargs.get('logger', logging.getLogger(__name__))
|
|
||||||
self.chatbot = kwargs.get('chatbot')
|
|
||||||
|
|
||||||
def set_chatbot(self, chatbot):
|
|
||||||
"""
|
|
||||||
Gives the adapter access to an instance of the ChatBot class.
|
|
||||||
|
|
||||||
:param chatbot: A chat bot instance.
|
|
||||||
:type chatbot: ChatBot
|
|
||||||
"""
|
|
||||||
self.chatbot = chatbot
|
|
||||||
|
|
||||||
class AdapterMethodNotImplementedError(NotImplementedError):
|
|
||||||
"""
|
|
||||||
An exception to be raised when an adapter method has not been implemented.
|
|
||||||
Typically this indicates that the developer is expected to implement the
|
|
||||||
method in a subclass.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, message=None):
|
|
||||||
"""
|
|
||||||
Set the message for the esception.
|
|
||||||
"""
|
|
||||||
if not message:
|
|
||||||
message = 'This method must be overridden in a subclass method.'
|
|
||||||
self.message = message
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return self.message
|
|
||||||
|
|
||||||
class InvalidAdapterTypeException(Exception):
|
|
||||||
"""
|
|
||||||
An exception to be raised when an adapter
|
|
||||||
of an unexpected class type is received.
|
|
||||||
"""
|
|
||||||
pass
|
|
@ -1,172 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import logging
|
|
||||||
|
|
||||||
from . import utils
|
|
||||||
|
|
||||||
|
|
||||||
class ChatBot(object):
|
|
||||||
"""
|
|
||||||
A conversational dialog chat bot.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, name, **kwargs):
|
|
||||||
from .logic import MultiLogicAdapter
|
|
||||||
|
|
||||||
self.name = name
|
|
||||||
kwargs['name'] = name
|
|
||||||
kwargs['chatbot'] = self
|
|
||||||
|
|
||||||
self.default_session = None
|
|
||||||
|
|
||||||
storage_adapter = kwargs.get('storage_adapter', 'chatter.chatterbot.storage.SQLStorageAdapter')
|
|
||||||
|
|
||||||
logic_adapters = kwargs.get('logic_adapters', [
|
|
||||||
'chatter.chatterbot.logic.BestMatch'
|
|
||||||
])
|
|
||||||
|
|
||||||
input_adapter = kwargs.get('input_adapter', 'chatter.chatterbot.input.VariableInputTypeAdapter')
|
|
||||||
|
|
||||||
output_adapter = kwargs.get('output_adapter', 'chatter.chatterbot.output.OutputAdapter')
|
|
||||||
|
|
||||||
# Check that each adapter is a valid subclass of it's respective parent
|
|
||||||
# utils.validate_adapter_class(storage_adapter, StorageAdapter)
|
|
||||||
# utils.validate_adapter_class(input_adapter, InputAdapter)
|
|
||||||
# utils.validate_adapter_class(output_adapter, OutputAdapter)
|
|
||||||
|
|
||||||
self.logic = MultiLogicAdapter(**kwargs)
|
|
||||||
self.storage = utils.initialize_class(storage_adapter, **kwargs)
|
|
||||||
self.input = utils.initialize_class(input_adapter, **kwargs)
|
|
||||||
self.output = utils.initialize_class(output_adapter, **kwargs)
|
|
||||||
|
|
||||||
filters = kwargs.get('filters', tuple())
|
|
||||||
self.filters = tuple([utils.import_module(F)() for F in filters])
|
|
||||||
|
|
||||||
# Add required system logic adapter
|
|
||||||
self.logic.system_adapters.append(
|
|
||||||
utils.initialize_class('chatter.chatterbot.logic.NoKnowledgeAdapter', **kwargs)
|
|
||||||
)
|
|
||||||
|
|
||||||
for adapter in logic_adapters:
|
|
||||||
self.logic.add_adapter(adapter, **kwargs)
|
|
||||||
|
|
||||||
# Add the chatbot instance to each adapter to share information such as
|
|
||||||
# the name, the current conversation, or other adapters
|
|
||||||
self.logic.set_chatbot(self)
|
|
||||||
self.input.set_chatbot(self)
|
|
||||||
self.output.set_chatbot(self)
|
|
||||||
|
|
||||||
preprocessors = kwargs.get(
|
|
||||||
'preprocessors', [
|
|
||||||
'chatter.chatterbot.preprocessors.clean_whitespace'
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
self.preprocessors = []
|
|
||||||
|
|
||||||
for preprocessor in preprocessors:
|
|
||||||
self.preprocessors.append(utils.import_module(preprocessor))
|
|
||||||
|
|
||||||
# Use specified trainer or fall back to the default
|
|
||||||
trainer = kwargs.get('trainer', 'chatter.chatterbot.trainers.Trainer')
|
|
||||||
TrainerClass = utils.import_module(trainer)
|
|
||||||
self.trainer = TrainerClass(self.storage, **kwargs)
|
|
||||||
self.training_data = kwargs.get('training_data')
|
|
||||||
|
|
||||||
self.default_conversation_id = None
|
|
||||||
|
|
||||||
self.logger = kwargs.get('logger', logging.getLogger(__name__))
|
|
||||||
|
|
||||||
# Allow the bot to save input it receives so that it can learn
|
|
||||||
self.read_only = kwargs.get('read_only', False)
|
|
||||||
|
|
||||||
if kwargs.get('initialize', True):
|
|
||||||
self.initialize()
|
|
||||||
|
|
||||||
def initialize(self):
|
|
||||||
"""
|
|
||||||
Do any work that needs to be done before the responses can be returned.
|
|
||||||
"""
|
|
||||||
self.logic.initialize()
|
|
||||||
|
|
||||||
def get_response(self, input_item, conversation_id=None):
|
|
||||||
"""
|
|
||||||
Return the bot's response based on the input.
|
|
||||||
|
|
||||||
:param input_item: An input value.
|
|
||||||
:param conversation_id: The id of a conversation.
|
|
||||||
:returns: A response to the input.
|
|
||||||
:rtype: Statement
|
|
||||||
"""
|
|
||||||
if not conversation_id:
|
|
||||||
if not self.default_conversation_id:
|
|
||||||
self.default_conversation_id = self.storage.create_conversation()
|
|
||||||
conversation_id = self.default_conversation_id
|
|
||||||
|
|
||||||
input_statement = self.input.process_input_statement(input_item)
|
|
||||||
|
|
||||||
# Preprocess the input statement
|
|
||||||
for preprocessor in self.preprocessors:
|
|
||||||
input_statement = preprocessor(self, input_statement)
|
|
||||||
|
|
||||||
statement, response = self.generate_response(input_statement, conversation_id)
|
|
||||||
|
|
||||||
# Learn that the user's input was a valid response to the chat bot's previous output
|
|
||||||
previous_statement = self.storage.get_latest_response(conversation_id)
|
|
||||||
|
|
||||||
if not self.read_only:
|
|
||||||
self.learn_response(statement, previous_statement)
|
|
||||||
self.storage.add_to_conversation(conversation_id, statement, response)
|
|
||||||
|
|
||||||
# Process the response output with the output adapter
|
|
||||||
return self.output.process_response(response, conversation_id)
|
|
||||||
|
|
||||||
def generate_response(self, input_statement, conversation_id):
|
|
||||||
"""
|
|
||||||
Return a response based on a given input statement.
|
|
||||||
"""
|
|
||||||
self.storage.generate_base_query(self, conversation_id)
|
|
||||||
|
|
||||||
# Select a response to the input statement
|
|
||||||
response = self.logic.process(input_statement)
|
|
||||||
|
|
||||||
return input_statement, response
|
|
||||||
|
|
||||||
def learn_response(self, statement, previous_statement):
|
|
||||||
"""
|
|
||||||
Learn that the statement provided is a valid response.
|
|
||||||
"""
|
|
||||||
from .conversation import Response
|
|
||||||
|
|
||||||
if previous_statement:
|
|
||||||
statement.add_response(
|
|
||||||
Response(previous_statement.text)
|
|
||||||
)
|
|
||||||
self.logger.info('Adding "{}" as a response to "{}"'.format(
|
|
||||||
statement.text,
|
|
||||||
previous_statement.text
|
|
||||||
))
|
|
||||||
|
|
||||||
# Save the statement after selecting a response
|
|
||||||
self.storage.update(statement)
|
|
||||||
|
|
||||||
def set_trainer(self, training_class, **kwargs):
|
|
||||||
"""
|
|
||||||
Set the module used to train the chatbot.
|
|
||||||
|
|
||||||
:param training_class: The training class to use for the chat bot.
|
|
||||||
:type training_class: `Trainer`
|
|
||||||
|
|
||||||
:param \**kwargs: Any parameters that should be passed to the training class.
|
|
||||||
"""
|
|
||||||
if 'chatbot' not in kwargs:
|
|
||||||
kwargs['chatbot'] = self
|
|
||||||
|
|
||||||
self.trainer = training_class(self.storage, **kwargs)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def train(self):
|
|
||||||
"""
|
|
||||||
Proxy method to the chat bot's trainer class.
|
|
||||||
"""
|
|
||||||
return self.trainer.train
|
|
@ -1,325 +0,0 @@
|
|||||||
# -*- coding: utf-8 -*-
|
|
||||||
|
|
||||||
|
|
||||||
"""
|
|
||||||
This module contains various text-comparison algorithms
|
|
||||||
designed to compare one statement to another.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# Use python-Levenshtein if available
|
|
||||||
try:
|
|
||||||
from Levenshtein.StringMatcher import StringMatcher as SequenceMatcher
|
|
||||||
except ImportError:
|
|
||||||
from difflib import SequenceMatcher
|
|
||||||
|
|
||||||
|
|
||||||
class Comparator:
|
|
||||||
|
|
||||||
def __call__(self, statement_a, statement_b):
|
|
||||||
return self.compare(statement_a, statement_b)
|
|
||||||
|
|
||||||
def compare(self, statement_a, statement_b):
|
|
||||||
return 0
|
|
||||||
|
|
||||||
def get_initialization_functions(self):
|
|
||||||
"""
|
|
||||||
Return all initialization methods for the comparison algorithm.
|
|
||||||
Initialization methods must start with 'initialize_' and
|
|
||||||
take no parameters.
|
|
||||||
"""
|
|
||||||
initialization_methods = [
|
|
||||||
(
|
|
||||||
method,
|
|
||||||
getattr(self, method),
|
|
||||||
) for method in dir(self) if method.startswith('initialize_')
|
|
||||||
]
|
|
||||||
|
|
||||||
return {
|
|
||||||
key: value for (key, value) in initialization_methods
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class LevenshteinDistance(Comparator):
|
|
||||||
"""
|
|
||||||
Compare two statements based on the Levenshtein distance
|
|
||||||
of each statement's text.
|
|
||||||
|
|
||||||
For example, there is a 65% similarity between the statements
|
|
||||||
"where is the post office?" and "looking for the post office"
|
|
||||||
based on the Levenshtein distance algorithm.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def compare(self, statement, other_statement):
|
|
||||||
"""
|
|
||||||
Compare the two input statements.
|
|
||||||
|
|
||||||
:return: The percent of similarity between the text of the statements.
|
|
||||||
:rtype: float
|
|
||||||
"""
|
|
||||||
|
|
||||||
# Return 0 if either statement has a falsy text value
|
|
||||||
if not statement.text or not other_statement.text:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
# Get the lowercase version of both strings
|
|
||||||
|
|
||||||
statement_text = str(statement.text.lower())
|
|
||||||
other_statement_text = str(other_statement.text.lower())
|
|
||||||
|
|
||||||
similarity = SequenceMatcher(
|
|
||||||
None,
|
|
||||||
statement_text,
|
|
||||||
other_statement_text
|
|
||||||
)
|
|
||||||
|
|
||||||
# Calculate a decimal percent of the similarity
|
|
||||||
percent = round(similarity.ratio(), 2)
|
|
||||||
|
|
||||||
return percent
|
|
||||||
|
|
||||||
|
|
||||||
class SynsetDistance(Comparator):
|
|
||||||
"""
|
|
||||||
Calculate the similarity of two statements.
|
|
||||||
This is based on the total maximum synset similarity between each word in each sentence.
|
|
||||||
|
|
||||||
This algorithm uses the `wordnet`_ functionality of `NLTK`_ to determine the similarity
|
|
||||||
of two statements based on the path similarity between each token of each statement.
|
|
||||||
This is essentially an evaluation of the closeness of synonyms.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def initialize_nltk_wordnet(self):
|
|
||||||
"""
|
|
||||||
Download required NLTK corpora if they have not already been downloaded.
|
|
||||||
"""
|
|
||||||
from .utils import nltk_download_corpus
|
|
||||||
|
|
||||||
nltk_download_corpus('corpora/wordnet')
|
|
||||||
|
|
||||||
def initialize_nltk_punkt(self):
|
|
||||||
"""
|
|
||||||
Download required NLTK corpora if they have not already been downloaded.
|
|
||||||
"""
|
|
||||||
from .utils import nltk_download_corpus
|
|
||||||
|
|
||||||
nltk_download_corpus('tokenizers/punkt')
|
|
||||||
|
|
||||||
def initialize_nltk_stopwords(self):
|
|
||||||
"""
|
|
||||||
Download required NLTK corpora if they have not already been downloaded.
|
|
||||||
"""
|
|
||||||
from .utils import nltk_download_corpus
|
|
||||||
|
|
||||||
nltk_download_corpus('corpora/stopwords')
|
|
||||||
|
|
||||||
def compare(self, statement, other_statement):
|
|
||||||
"""
|
|
||||||
Compare the two input statements.
|
|
||||||
|
|
||||||
:return: The percent of similarity between the closest synset distance.
|
|
||||||
:rtype: float
|
|
||||||
|
|
||||||
.. _wordnet: http://www.nltk.org/howto/wordnet.html
|
|
||||||
.. _NLTK: http://www.nltk.org/
|
|
||||||
"""
|
|
||||||
from nltk.corpus import wordnet
|
|
||||||
from nltk import word_tokenize
|
|
||||||
from . import utils
|
|
||||||
import itertools
|
|
||||||
|
|
||||||
tokens1 = word_tokenize(statement.text.lower())
|
|
||||||
tokens2 = word_tokenize(other_statement.text.lower())
|
|
||||||
|
|
||||||
# Remove all stop words from the list of word tokens
|
|
||||||
tokens1 = utils.remove_stopwords(tokens1, language='english')
|
|
||||||
tokens2 = utils.remove_stopwords(tokens2, language='english')
|
|
||||||
|
|
||||||
# The maximum possible similarity is an exact match
|
|
||||||
# Because path_similarity returns a value between 0 and 1,
|
|
||||||
# max_possible_similarity is the number of words in the longer
|
|
||||||
# of the two input statements.
|
|
||||||
max_possible_similarity = max(
|
|
||||||
len(statement.text.split()),
|
|
||||||
len(other_statement.text.split())
|
|
||||||
)
|
|
||||||
|
|
||||||
max_similarity = 0.0
|
|
||||||
|
|
||||||
# Get the highest matching value for each possible combination of words
|
|
||||||
for combination in itertools.product(*[tokens1, tokens2]):
|
|
||||||
|
|
||||||
synset1 = wordnet.synsets(combination[0])
|
|
||||||
synset2 = wordnet.synsets(combination[1])
|
|
||||||
|
|
||||||
if synset1 and synset2:
|
|
||||||
|
|
||||||
# Get the highest similarity for each combination of synsets
|
|
||||||
for synset in itertools.product(*[synset1, synset2]):
|
|
||||||
similarity = synset[0].path_similarity(synset[1])
|
|
||||||
|
|
||||||
if similarity and (similarity > max_similarity):
|
|
||||||
max_similarity = similarity
|
|
||||||
|
|
||||||
if max_possible_similarity == 0:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
return max_similarity / max_possible_similarity
|
|
||||||
|
|
||||||
|
|
||||||
class SentimentComparison(Comparator):
|
|
||||||
"""
|
|
||||||
Calculate the similarity of two statements based on the closeness of
|
|
||||||
the sentiment value calculated for each statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def initialize_nltk_vader_lexicon(self):
|
|
||||||
"""
|
|
||||||
Download the NLTK vader lexicon for sentiment analysis
|
|
||||||
that is required for this algorithm to run.
|
|
||||||
"""
|
|
||||||
from .utils import nltk_download_corpus
|
|
||||||
|
|
||||||
nltk_download_corpus('sentiment/vader_lexicon')
|
|
||||||
|
|
||||||
def compare(self, statement, other_statement):
|
|
||||||
"""
|
|
||||||
Return the similarity of two statements based on
|
|
||||||
their calculated sentiment values.
|
|
||||||
|
|
||||||
:return: The percent of similarity between the sentiment value.
|
|
||||||
:rtype: float
|
|
||||||
"""
|
|
||||||
from nltk.sentiment.vader import SentimentIntensityAnalyzer
|
|
||||||
|
|
||||||
sentiment_analyzer = SentimentIntensityAnalyzer()
|
|
||||||
statement_polarity = sentiment_analyzer.polarity_scores(statement.text.lower())
|
|
||||||
statement2_polarity = sentiment_analyzer.polarity_scores(other_statement.text.lower())
|
|
||||||
|
|
||||||
statement_greatest_polarity = 'neu'
|
|
||||||
statement_greatest_score = -1
|
|
||||||
for polarity in sorted(statement_polarity):
|
|
||||||
if statement_polarity[polarity] > statement_greatest_score:
|
|
||||||
statement_greatest_polarity = polarity
|
|
||||||
statement_greatest_score = statement_polarity[polarity]
|
|
||||||
|
|
||||||
statement2_greatest_polarity = 'neu'
|
|
||||||
statement2_greatest_score = -1
|
|
||||||
for polarity in sorted(statement2_polarity):
|
|
||||||
if statement2_polarity[polarity] > statement2_greatest_score:
|
|
||||||
statement2_greatest_polarity = polarity
|
|
||||||
statement2_greatest_score = statement2_polarity[polarity]
|
|
||||||
|
|
||||||
# Check if the polarity if of a different type
|
|
||||||
if statement_greatest_polarity != statement2_greatest_polarity:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
values = [statement_greatest_score, statement2_greatest_score]
|
|
||||||
difference = max(values) - min(values)
|
|
||||||
|
|
||||||
return 1.0 - difference
|
|
||||||
|
|
||||||
|
|
||||||
class JaccardSimilarity(Comparator):
|
|
||||||
"""
|
|
||||||
Calculates the similarity of two statements based on the Jaccard index.
|
|
||||||
|
|
||||||
The Jaccard index is composed of a numerator and denominator.
|
|
||||||
In the numerator, we count the number of items that are shared between the sets.
|
|
||||||
In the denominator, we count the total number of items across both sets.
|
|
||||||
Let's say we define sentences to be equivalent if 50% or more of their tokens are equivalent.
|
|
||||||
Here are two sample sentences:
|
|
||||||
|
|
||||||
The young cat is hungry.
|
|
||||||
The cat is very hungry.
|
|
||||||
|
|
||||||
When we parse these sentences to remove stopwords, we end up with the following two sets:
|
|
||||||
|
|
||||||
{young, cat, hungry}
|
|
||||||
{cat, very, hungry}
|
|
||||||
|
|
||||||
In our example above, our intersection is {cat, hungry}, which has count of two.
|
|
||||||
The union of the sets is {young, cat, very, hungry}, which has a count of four.
|
|
||||||
Therefore, our `Jaccard similarity index`_ is two divided by four, or 50%.
|
|
||||||
Given our similarity threshold above, we would consider this to be a match.
|
|
||||||
|
|
||||||
.. _`Jaccard similarity index`: https://en.wikipedia.org/wiki/Jaccard_index
|
|
||||||
"""
|
|
||||||
|
|
||||||
SIMILARITY_THRESHOLD = 0.5
|
|
||||||
|
|
||||||
def initialize_nltk_wordnet(self):
|
|
||||||
"""
|
|
||||||
Download the NLTK wordnet corpora that is required for this algorithm
|
|
||||||
to run only if the corpora has not already been downloaded.
|
|
||||||
"""
|
|
||||||
from .utils import nltk_download_corpus
|
|
||||||
|
|
||||||
nltk_download_corpus('corpora/wordnet')
|
|
||||||
|
|
||||||
def compare(self, statement, other_statement):
|
|
||||||
"""
|
|
||||||
Return the calculated similarity of two
|
|
||||||
statements based on the Jaccard index.
|
|
||||||
"""
|
|
||||||
from nltk.corpus import wordnet
|
|
||||||
import nltk
|
|
||||||
import string
|
|
||||||
|
|
||||||
a = statement.text.lower()
|
|
||||||
b = other_statement.text.lower()
|
|
||||||
|
|
||||||
# Get default English stopwords and extend with punctuation
|
|
||||||
stopwords = nltk.corpus.stopwords.words('english')
|
|
||||||
stopwords.extend(string.punctuation)
|
|
||||||
stopwords.append('')
|
|
||||||
lemmatizer = nltk.stem.wordnet.WordNetLemmatizer()
|
|
||||||
|
|
||||||
def get_wordnet_pos(pos_tag):
|
|
||||||
if pos_tag[1].startswith('J'):
|
|
||||||
return (pos_tag[0], wordnet.ADJ)
|
|
||||||
elif pos_tag[1].startswith('V'):
|
|
||||||
return (pos_tag[0], wordnet.VERB)
|
|
||||||
elif pos_tag[1].startswith('N'):
|
|
||||||
return (pos_tag[0], wordnet.NOUN)
|
|
||||||
elif pos_tag[1].startswith('R'):
|
|
||||||
return (pos_tag[0], wordnet.ADV)
|
|
||||||
else:
|
|
||||||
return (pos_tag[0], wordnet.NOUN)
|
|
||||||
|
|
||||||
ratio = 0
|
|
||||||
pos_a = map(get_wordnet_pos, nltk.pos_tag(nltk.tokenize.word_tokenize(a)))
|
|
||||||
pos_b = map(get_wordnet_pos, nltk.pos_tag(nltk.tokenize.word_tokenize(b)))
|
|
||||||
lemma_a = [
|
|
||||||
lemmatizer.lemmatize(
|
|
||||||
token.strip(string.punctuation),
|
|
||||||
pos
|
|
||||||
) for token, pos in pos_a if pos == wordnet.NOUN and token.strip(
|
|
||||||
string.punctuation
|
|
||||||
) not in stopwords
|
|
||||||
]
|
|
||||||
lemma_b = [
|
|
||||||
lemmatizer.lemmatize(
|
|
||||||
token.strip(string.punctuation),
|
|
||||||
pos
|
|
||||||
) for token, pos in pos_b if pos == wordnet.NOUN and token.strip(
|
|
||||||
string.punctuation
|
|
||||||
) not in stopwords
|
|
||||||
]
|
|
||||||
|
|
||||||
# Calculate Jaccard similarity
|
|
||||||
try:
|
|
||||||
numerator = len(set(lemma_a).intersection(lemma_b))
|
|
||||||
denominator = float(len(set(lemma_a).union(lemma_b)))
|
|
||||||
ratio = numerator / denominator
|
|
||||||
except Exception as e:
|
|
||||||
print('Error', e)
|
|
||||||
return ratio >= self.SIMILARITY_THRESHOLD
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------- #
|
|
||||||
|
|
||||||
|
|
||||||
levenshtein_distance = LevenshteinDistance()
|
|
||||||
synset_distance = SynsetDistance()
|
|
||||||
sentiment_comparison = SentimentComparison()
|
|
||||||
jaccard_similarity = JaccardSimilarity()
|
|
@ -1,15 +0,0 @@
|
|||||||
"""
|
|
||||||
ChatterBot constants
|
|
||||||
"""
|
|
||||||
|
|
||||||
'''
|
|
||||||
The maximum length of characters that the text of a statement can contain.
|
|
||||||
This should be enforced on a per-model basis by the data model for each
|
|
||||||
storage adapter.
|
|
||||||
'''
|
|
||||||
STATEMENT_TEXT_MAX_LENGTH = 400
|
|
||||||
|
|
||||||
# The maximum length of characters that the name of a tag can contain
|
|
||||||
TAG_NAME_MAX_LENGTH = 50
|
|
||||||
|
|
||||||
DEFAULT_DJANGO_APP_NAME = 'django_chatterbot'
|
|
@ -1,213 +0,0 @@
|
|||||||
class StatementMixin(object):
|
|
||||||
"""
|
|
||||||
This class has shared methods used to
|
|
||||||
normalize different statement models.
|
|
||||||
"""
|
|
||||||
tags = []
|
|
||||||
|
|
||||||
def get_tags(self):
|
|
||||||
"""
|
|
||||||
Return the list of tags for this statement.
|
|
||||||
"""
|
|
||||||
return self.tags
|
|
||||||
|
|
||||||
def add_tags(self, tags):
|
|
||||||
"""
|
|
||||||
Add a list of strings to the statement as tags.
|
|
||||||
"""
|
|
||||||
for tag in tags:
|
|
||||||
self.tags.append(tag)
|
|
||||||
|
|
||||||
|
|
||||||
class Statement(StatementMixin):
|
|
||||||
"""
|
|
||||||
A statement represents a single spoken entity, sentence or
|
|
||||||
phrase that someone can say.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, text, **kwargs):
|
|
||||||
|
|
||||||
# Try not to allow non-string types to be passed to statements
|
|
||||||
try:
|
|
||||||
text = str(text)
|
|
||||||
except UnicodeEncodeError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
self.text = text
|
|
||||||
self.tags = kwargs.pop('tags', [])
|
|
||||||
self.in_response_to = kwargs.pop('in_response_to', [])
|
|
||||||
|
|
||||||
self.extra_data = kwargs.pop('extra_data', {})
|
|
||||||
|
|
||||||
# This is the confidence with which the chat bot believes
|
|
||||||
# this is an accurate response. This value is set when the
|
|
||||||
# statement is returned by the chat bot.
|
|
||||||
self.confidence = 0
|
|
||||||
|
|
||||||
self.storage = None
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return self.text
|
|
||||||
|
|
||||||
def __repr__(self):
|
|
||||||
return '<Statement text:%s>' % (self.text)
|
|
||||||
|
|
||||||
def __hash__(self):
|
|
||||||
return hash(self.text)
|
|
||||||
|
|
||||||
def __eq__(self, other):
|
|
||||||
if not other:
|
|
||||||
return False
|
|
||||||
|
|
||||||
if isinstance(other, Statement):
|
|
||||||
return self.text == other.text
|
|
||||||
|
|
||||||
return self.text == other
|
|
||||||
|
|
||||||
def save(self):
|
|
||||||
"""
|
|
||||||
Save the statement in the database.
|
|
||||||
"""
|
|
||||||
self.storage.update(self)
|
|
||||||
|
|
||||||
def add_extra_data(self, key, value):
|
|
||||||
"""
|
|
||||||
This method allows additional data to be stored on the statement object.
|
|
||||||
|
|
||||||
Typically this data is something that pertains just to this statement.
|
|
||||||
For example, a value stored here might be the tagged parts of speech for
|
|
||||||
each word in the statement text.
|
|
||||||
|
|
||||||
- key = 'pos_tags'
|
|
||||||
- value = [('Now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('different', 'JJ')]
|
|
||||||
|
|
||||||
:param key: The key to use in the dictionary of extra data.
|
|
||||||
:type key: str
|
|
||||||
|
|
||||||
:param value: The value to set for the specified key.
|
|
||||||
"""
|
|
||||||
self.extra_data[key] = value
|
|
||||||
|
|
||||||
def add_response(self, response):
|
|
||||||
"""
|
|
||||||
Add the response to the list of statements that this statement is in response to.
|
|
||||||
If the response is already in the list, increment the occurrence count of that response.
|
|
||||||
|
|
||||||
:param response: The response to add.
|
|
||||||
:type response: `Response`
|
|
||||||
"""
|
|
||||||
if not isinstance(response, Response):
|
|
||||||
raise Statement.InvalidTypeException(
|
|
||||||
'A {} was received when a {} instance was expected'.format(
|
|
||||||
type(response),
|
|
||||||
type(Response(''))
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
updated = False
|
|
||||||
for index in range(0, len(self.in_response_to)):
|
|
||||||
if response.text == self.in_response_to[index].text:
|
|
||||||
self.in_response_to[index].occurrence += 1
|
|
||||||
updated = True
|
|
||||||
|
|
||||||
if not updated:
|
|
||||||
self.in_response_to.append(response)
|
|
||||||
|
|
||||||
def remove_response(self, response_text):
|
|
||||||
"""
|
|
||||||
Removes a response from the statement's response list based
|
|
||||||
on the value of the response text.
|
|
||||||
|
|
||||||
:param response_text: The text of the response to be removed.
|
|
||||||
:type response_text: str
|
|
||||||
"""
|
|
||||||
for response in self.in_response_to:
|
|
||||||
if response_text == response.text:
|
|
||||||
self.in_response_to.remove(response)
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
def get_response_count(self, statement):
|
|
||||||
"""
|
|
||||||
Find the number of times that the statement has been used
|
|
||||||
as a response to the current statement.
|
|
||||||
|
|
||||||
:param statement: The statement object to get the count for.
|
|
||||||
:type statement: `Statement`
|
|
||||||
|
|
||||||
:returns: Return the number of times the statement has been used as a response.
|
|
||||||
:rtype: int
|
|
||||||
"""
|
|
||||||
for response in self.in_response_to:
|
|
||||||
if statement.text == response.text:
|
|
||||||
return response.occurrence
|
|
||||||
|
|
||||||
return 0
|
|
||||||
|
|
||||||
def serialize(self):
|
|
||||||
"""
|
|
||||||
:returns: A dictionary representation of the statement object.
|
|
||||||
:rtype: dict
|
|
||||||
"""
|
|
||||||
data = {'text': self.text, 'in_response_to': [], 'extra_data': self.extra_data}
|
|
||||||
|
|
||||||
for response in self.in_response_to:
|
|
||||||
data['in_response_to'].append(response.serialize())
|
|
||||||
|
|
||||||
return data
|
|
||||||
|
|
||||||
@property
|
|
||||||
def response_statement_cache(self):
|
|
||||||
"""
|
|
||||||
This property is to allow ChatterBot Statement objects to
|
|
||||||
be swappable with Django Statement models.
|
|
||||||
"""
|
|
||||||
return self.in_response_to
|
|
||||||
|
|
||||||
class InvalidTypeException(Exception):
|
|
||||||
|
|
||||||
def __init__(self, value='Received an unexpected value type.'):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
||||||
|
|
||||||
|
|
||||||
class Response(object):
|
|
||||||
"""
|
|
||||||
A response represents an entity which response to a statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, text, **kwargs):
|
|
||||||
from datetime import datetime
|
|
||||||
from dateutil import parser as date_parser
|
|
||||||
|
|
||||||
self.text = text
|
|
||||||
self.created_at = kwargs.get('created_at', datetime.now())
|
|
||||||
self.occurrence = kwargs.get('occurrence', 1)
|
|
||||||
|
|
||||||
if not isinstance(self.created_at, datetime):
|
|
||||||
self.created_at = date_parser.parse(self.created_at)
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return self.text
|
|
||||||
|
|
||||||
def __repr__(self):
|
|
||||||
return '<Response text:%s>' % (self.text)
|
|
||||||
|
|
||||||
def __hash__(self):
|
|
||||||
return hash(self.text)
|
|
||||||
|
|
||||||
def __eq__(self, other):
|
|
||||||
if not other:
|
|
||||||
return False
|
|
||||||
|
|
||||||
if isinstance(other, Response):
|
|
||||||
return self.text == other.text
|
|
||||||
|
|
||||||
return self.text == other
|
|
||||||
|
|
||||||
def serialize(self):
|
|
||||||
data = {'text': self.text, 'created_at': self.created_at.isoformat(), 'occurrence': self.occurrence}
|
|
||||||
|
|
||||||
return data
|
|
@ -1,10 +0,0 @@
|
|||||||
"""
|
|
||||||
Seamlessly import the external chatterbot corpus module.
|
|
||||||
View the corpus on GitHub at https://github.com/gunthercox/chatterbot-corpus
|
|
||||||
"""
|
|
||||||
|
|
||||||
from chatterbot_corpus import Corpus
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'Corpus',
|
|
||||||
)
|
|
@ -1,131 +0,0 @@
|
|||||||
from sqlalchemy import Table, Column, Integer, DateTime, ForeignKey, PickleType
|
|
||||||
from sqlalchemy.ext.declarative import declared_attr, declarative_base
|
|
||||||
from sqlalchemy.orm import relationship
|
|
||||||
from sqlalchemy.sql import func
|
|
||||||
|
|
||||||
from ...constants import TAG_NAME_MAX_LENGTH, STATEMENT_TEXT_MAX_LENGTH
|
|
||||||
from ...conversation import StatementMixin
|
|
||||||
from .types import UnicodeString
|
|
||||||
|
|
||||||
|
|
||||||
class ModelBase(object):
|
|
||||||
"""
|
|
||||||
An augmented base class for SqlAlchemy models.
|
|
||||||
"""
|
|
||||||
|
|
||||||
@declared_attr
|
|
||||||
def __tablename__(cls):
|
|
||||||
"""
|
|
||||||
Return the lowercase class name as the name of the table.
|
|
||||||
"""
|
|
||||||
return cls.__name__.lower()
|
|
||||||
|
|
||||||
id = Column(
|
|
||||||
Integer,
|
|
||||||
primary_key=True,
|
|
||||||
autoincrement=True
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
Base = declarative_base(cls=ModelBase)
|
|
||||||
|
|
||||||
tag_association_table = Table(
|
|
||||||
'tag_association',
|
|
||||||
Base.metadata,
|
|
||||||
Column('tag_id', Integer, ForeignKey('tag.id')),
|
|
||||||
Column('statement_id', Integer, ForeignKey('statement.id'))
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class Tag(Base):
|
|
||||||
"""
|
|
||||||
A tag that describes a statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
name = Column(UnicodeString(TAG_NAME_MAX_LENGTH))
|
|
||||||
|
|
||||||
|
|
||||||
class Statement(Base, StatementMixin):
|
|
||||||
"""
|
|
||||||
A Statement represents a sentence or phrase.
|
|
||||||
"""
|
|
||||||
|
|
||||||
text = Column(UnicodeString(STATEMENT_TEXT_MAX_LENGTH), unique=True)
|
|
||||||
|
|
||||||
tags = relationship(
|
|
||||||
'Tag',
|
|
||||||
secondary=lambda: tag_association_table,
|
|
||||||
backref='statements'
|
|
||||||
)
|
|
||||||
|
|
||||||
extra_data = Column(PickleType)
|
|
||||||
|
|
||||||
in_response_to = relationship(
|
|
||||||
'Response',
|
|
||||||
back_populates='statement_table'
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_tags(self):
|
|
||||||
"""
|
|
||||||
Return a list of tags for this statement.
|
|
||||||
"""
|
|
||||||
return [tag.name for tag in self.tags]
|
|
||||||
|
|
||||||
def get_statement(self):
|
|
||||||
from ...conversation import Statement as StatementObject
|
|
||||||
from ...conversation import Response as ResponseObject
|
|
||||||
|
|
||||||
statement = StatementObject(
|
|
||||||
self.text,
|
|
||||||
tags=[tag.name for tag in self.tags],
|
|
||||||
extra_data=self.extra_data
|
|
||||||
)
|
|
||||||
for response in self.in_response_to:
|
|
||||||
statement.add_response(
|
|
||||||
ResponseObject(text=response.text, occurrence=response.occurrence)
|
|
||||||
)
|
|
||||||
return statement
|
|
||||||
|
|
||||||
|
|
||||||
class Response(Base):
|
|
||||||
"""
|
|
||||||
Response, contains responses related to a given statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
text = Column(UnicodeString(STATEMENT_TEXT_MAX_LENGTH))
|
|
||||||
|
|
||||||
created_at = Column(
|
|
||||||
DateTime(timezone=True),
|
|
||||||
server_default=func.now()
|
|
||||||
)
|
|
||||||
|
|
||||||
occurrence = Column(Integer, default=1)
|
|
||||||
|
|
||||||
statement_text = Column(UnicodeString(STATEMENT_TEXT_MAX_LENGTH), ForeignKey('statement.text'))
|
|
||||||
|
|
||||||
statement_table = relationship(
|
|
||||||
'Statement',
|
|
||||||
back_populates='in_response_to',
|
|
||||||
cascade='all',
|
|
||||||
uselist=False
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
conversation_association_table = Table(
|
|
||||||
'conversation_association',
|
|
||||||
Base.metadata,
|
|
||||||
Column('conversation_id', Integer, ForeignKey('conversation.id')),
|
|
||||||
Column('statement_id', Integer, ForeignKey('statement.id'))
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class Conversation(Base):
|
|
||||||
"""
|
|
||||||
A conversation.
|
|
||||||
"""
|
|
||||||
|
|
||||||
statements = relationship(
|
|
||||||
'Statement',
|
|
||||||
secondary=lambda: conversation_association_table,
|
|
||||||
backref='conversations'
|
|
||||||
)
|
|
@ -1,16 +0,0 @@
|
|||||||
from sqlalchemy.types import TypeDecorator, Unicode
|
|
||||||
|
|
||||||
|
|
||||||
class UnicodeString(TypeDecorator):
|
|
||||||
"""
|
|
||||||
Type for unicode strings.
|
|
||||||
"""
|
|
||||||
|
|
||||||
impl = Unicode
|
|
||||||
|
|
||||||
def process_bind_param(self, value, dialect):
|
|
||||||
"""
|
|
||||||
Coerce Python bytestrings to unicode before
|
|
||||||
saving them to the database.
|
|
||||||
"""
|
|
||||||
return value
|
|
@ -1,47 +0,0 @@
|
|||||||
"""
|
|
||||||
Filters set the base query that gets passed to the storage adapter.
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
class Filter(object):
|
|
||||||
"""
|
|
||||||
A base filter object from which all other
|
|
||||||
filters should be subclassed.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def filter_selection(self, chatterbot, conversation_id):
|
|
||||||
"""
|
|
||||||
Because this is the base filter class, this method just
|
|
||||||
returns the storage adapter's base query. Other filters
|
|
||||||
are expected to override this method.
|
|
||||||
"""
|
|
||||||
return chatterbot.storage.base_query
|
|
||||||
|
|
||||||
|
|
||||||
class RepetitiveResponseFilter(Filter):
|
|
||||||
"""
|
|
||||||
A filter that eliminates possibly repetitive responses to prevent
|
|
||||||
a chat bot from repeating statements that it has recently said.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def filter_selection(self, chatterbot, conversation_id):
|
|
||||||
|
|
||||||
text_of_recent_responses = []
|
|
||||||
|
|
||||||
# TODO: Add a larger quantity of response history
|
|
||||||
latest_response = chatterbot.storage.get_latest_response(conversation_id)
|
|
||||||
if latest_response:
|
|
||||||
text_of_recent_responses.append(latest_response.text)
|
|
||||||
|
|
||||||
# Return the query with no changes if there are no statements to exclude
|
|
||||||
if not text_of_recent_responses:
|
|
||||||
return super(RepetitiveResponseFilter, self).filter_selection(
|
|
||||||
chatterbot,
|
|
||||||
conversation_id
|
|
||||||
)
|
|
||||||
|
|
||||||
query = chatterbot.storage.base_query.statement_text_not_in(
|
|
||||||
text_of_recent_responses
|
|
||||||
)
|
|
||||||
|
|
||||||
return query
|
|
@ -1,17 +0,0 @@
|
|||||||
from .input_adapter import InputAdapter
|
|
||||||
from .gitter import Gitter
|
|
||||||
from .hipchat import HipChat
|
|
||||||
from .mailgun import Mailgun
|
|
||||||
from .microsoft import Microsoft
|
|
||||||
from .terminal import TerminalAdapter
|
|
||||||
from .variable_input_type_adapter import VariableInputTypeAdapter
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'InputAdapter',
|
|
||||||
'Microsoft',
|
|
||||||
'Gitter',
|
|
||||||
'HipChat',
|
|
||||||
'Mailgun',
|
|
||||||
'TerminalAdapter',
|
|
||||||
'VariableInputTypeAdapter',
|
|
||||||
)
|
|
@ -1,178 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from time import sleep
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Gitter(InputAdapter):
|
|
||||||
"""
|
|
||||||
An input adapter that allows a ChatterBot instance to get
|
|
||||||
input statements from a Gitter room.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Gitter, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.gitter_host = kwargs.get('gitter_host', 'https://api.gitter.im/v1/')
|
|
||||||
self.gitter_room = kwargs.get('gitter_room')
|
|
||||||
self.gitter_api_token = kwargs.get('gitter_api_token')
|
|
||||||
self.only_respond_to_mentions = kwargs.get('gitter_only_respond_to_mentions', True)
|
|
||||||
self.sleep_time = kwargs.get('gitter_sleep_time', 4)
|
|
||||||
|
|
||||||
authorization_header = 'Bearer {}'.format(self.gitter_api_token)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json',
|
|
||||||
'Accept': 'application/json'
|
|
||||||
}
|
|
||||||
|
|
||||||
# Join the Gitter room
|
|
||||||
room_data = self.join_room(self.gitter_room)
|
|
||||||
self.room_id = room_data.get('id')
|
|
||||||
|
|
||||||
user_data = self.get_user_data()
|
|
||||||
self.user_id = user_data[0].get('id')
|
|
||||||
self.username = user_data[0].get('username')
|
|
||||||
|
|
||||||
def _validate_status_code(self, response):
|
|
||||||
code = response.status_code
|
|
||||||
if code not in [200, 201]:
|
|
||||||
raise self.HTTPStatusException('{} status code recieved'.format(code))
|
|
||||||
|
|
||||||
def join_room(self, room_name):
|
|
||||||
"""
|
|
||||||
Join the specified Gitter room.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}rooms'.format(self.gitter_host)
|
|
||||||
response = requests.post(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
json={'uri': room_name}
|
|
||||||
)
|
|
||||||
self.logger.info('{} joining room {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def get_user_data(self):
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}user'.format(self.gitter_host)
|
|
||||||
response = requests.get(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers
|
|
||||||
)
|
|
||||||
self.logger.info('{} retrieving user data {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def mark_messages_as_read(self, message_ids):
|
|
||||||
"""
|
|
||||||
Mark the specified message ids as read.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}user/{}/rooms/{}/unreadItems'.format(
|
|
||||||
self.gitter_host, self.user_id, self.room_id
|
|
||||||
)
|
|
||||||
response = requests.post(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
json={'chat': message_ids}
|
|
||||||
)
|
|
||||||
self.logger.info('{} marking messages as read {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def get_most_recent_message(self):
|
|
||||||
"""
|
|
||||||
Get the most recent message from the Gitter room.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}rooms/{}/chatMessages?limit=1'.format(self.gitter_host, self.room_id)
|
|
||||||
response = requests.get(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers
|
|
||||||
)
|
|
||||||
self.logger.info('{} getting most recent message'.format(
|
|
||||||
response.status_code
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
data = response.json()
|
|
||||||
if data:
|
|
||||||
return data[0]
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _contains_mention(self, mentions):
|
|
||||||
for mention in mentions:
|
|
||||||
if self.username == mention.get('screenName'):
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
def should_respond(self, data):
|
|
||||||
"""
|
|
||||||
Takes the API response data from a single message.
|
|
||||||
Returns true if the chat bot should respond.
|
|
||||||
"""
|
|
||||||
if data:
|
|
||||||
unread = data.get('unread', False)
|
|
||||||
|
|
||||||
if self.only_respond_to_mentions:
|
|
||||||
if unread and self._contains_mention(data['mentions']):
|
|
||||||
return True
|
|
||||||
else:
|
|
||||||
return False
|
|
||||||
elif unread:
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def remove_mentions(self, text):
|
|
||||||
"""
|
|
||||||
Return a string that has no leading mentions.
|
|
||||||
"""
|
|
||||||
import re
|
|
||||||
text_without_mentions = re.sub(r'@\S+', '', text)
|
|
||||||
|
|
||||||
# Remove consecutive spaces
|
|
||||||
text_without_mentions = re.sub(' +', ' ', text_without_mentions.strip())
|
|
||||||
|
|
||||||
return text_without_mentions
|
|
||||||
|
|
||||||
def process_input(self, statement):
|
|
||||||
new_message = False
|
|
||||||
|
|
||||||
while not new_message:
|
|
||||||
data = self.get_most_recent_message()
|
|
||||||
if self.should_respond(data):
|
|
||||||
self.mark_messages_as_read([data['id']])
|
|
||||||
new_message = True
|
|
||||||
sleep(self.sleep_time)
|
|
||||||
|
|
||||||
text = self.remove_mentions(data['text'])
|
|
||||||
statement = Statement(text)
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
class HTTPStatusException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when unexpected non-success HTTP
|
|
||||||
status codes are returned in a response.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,115 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from time import sleep
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class HipChat(InputAdapter):
|
|
||||||
"""
|
|
||||||
An input adapter that allows a ChatterBot instance to get
|
|
||||||
input statements from a HipChat room.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(HipChat, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.hipchat_host = kwargs.get('hipchat_host')
|
|
||||||
self.hipchat_access_token = kwargs.get('hipchat_access_token')
|
|
||||||
self.hipchat_room = kwargs.get('hipchat_room')
|
|
||||||
self.session_id = str(self.chatbot.default_session.uuid)
|
|
||||||
|
|
||||||
import requests
|
|
||||||
self.session = requests.Session()
|
|
||||||
self.session.verify = kwargs.get('ssl_verify', True)
|
|
||||||
|
|
||||||
authorization_header = 'Bearer {}'.format(self.hipchat_access_token)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json'
|
|
||||||
}
|
|
||||||
|
|
||||||
# This is a list of the messages that have been responded to
|
|
||||||
self.recent_message_ids = self.get_initial_ids()
|
|
||||||
|
|
||||||
def get_initial_ids(self):
|
|
||||||
"""
|
|
||||||
Returns a list of the most recent message ids.
|
|
||||||
"""
|
|
||||||
data = self.view_recent_room_history(
|
|
||||||
self.hipchat_room,
|
|
||||||
max_results=75
|
|
||||||
)
|
|
||||||
|
|
||||||
results = set()
|
|
||||||
|
|
||||||
for item in data['items']:
|
|
||||||
results.add(item['id'])
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
def view_recent_room_history(self, room_id_or_name, max_results=1):
|
|
||||||
"""
|
|
||||||
https://www.hipchat.com/docs/apiv2/method/view_recent_room_history
|
|
||||||
"""
|
|
||||||
|
|
||||||
recent_histroy_url = '{}/v2/room/{}/history?max-results={}'.format(
|
|
||||||
self.hipchat_host,
|
|
||||||
room_id_or_name,
|
|
||||||
max_results
|
|
||||||
)
|
|
||||||
|
|
||||||
response = self.session.get(
|
|
||||||
recent_histroy_url,
|
|
||||||
headers=self.headers
|
|
||||||
)
|
|
||||||
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def get_most_recent_message(self, room_id_or_name):
|
|
||||||
"""
|
|
||||||
Return the most recent message from the HipChat room.
|
|
||||||
"""
|
|
||||||
data = self.view_recent_room_history(room_id_or_name)
|
|
||||||
|
|
||||||
items = data['items']
|
|
||||||
|
|
||||||
if not items:
|
|
||||||
return None
|
|
||||||
return items[-1]
|
|
||||||
|
|
||||||
def process_input(self, statement):
|
|
||||||
"""
|
|
||||||
Process input from the HipChat room.
|
|
||||||
"""
|
|
||||||
new_message = False
|
|
||||||
|
|
||||||
response_statement = self.chatbot.storage.get_latest_response(
|
|
||||||
self.session_id
|
|
||||||
)
|
|
||||||
|
|
||||||
if response_statement:
|
|
||||||
last_message_id = response_statement.extra_data.get(
|
|
||||||
'hipchat_message_id', None
|
|
||||||
)
|
|
||||||
if last_message_id:
|
|
||||||
self.recent_message_ids.add(last_message_id)
|
|
||||||
|
|
||||||
while not new_message:
|
|
||||||
data = self.get_most_recent_message(self.hipchat_room)
|
|
||||||
|
|
||||||
if data and data['id'] not in self.recent_message_ids:
|
|
||||||
self.recent_message_ids.add(data['id'])
|
|
||||||
new_message = True
|
|
||||||
else:
|
|
||||||
pass
|
|
||||||
sleep(3.5)
|
|
||||||
|
|
||||||
text = data['message']
|
|
||||||
|
|
||||||
statement = Statement(text)
|
|
||||||
statement.add_extra_data('hipchat_message_id', data['id'])
|
|
||||||
|
|
||||||
return statement
|
|
@ -1,34 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..adapters import Adapter
|
|
||||||
|
|
||||||
|
|
||||||
class InputAdapter(Adapter):
|
|
||||||
"""
|
|
||||||
This is an abstract class that represents the
|
|
||||||
interface that all input adapters should implement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def process_input(self, *args, **kwargs):
|
|
||||||
"""
|
|
||||||
Returns a statement object based on the input source.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError()
|
|
||||||
|
|
||||||
def process_input_statement(self, *args, **kwargs):
|
|
||||||
"""
|
|
||||||
Return an existing statement object (if one exists).
|
|
||||||
"""
|
|
||||||
input_statement = self.process_input(*args, **kwargs)
|
|
||||||
|
|
||||||
self.logger.info('Received input statement: {}'.format(input_statement.text))
|
|
||||||
|
|
||||||
existing_statement = self.chatbot.storage.find(input_statement.text)
|
|
||||||
|
|
||||||
if existing_statement:
|
|
||||||
self.logger.info('"{}" is a known statement'.format(input_statement.text))
|
|
||||||
input_statement = existing_statement
|
|
||||||
else:
|
|
||||||
self.logger.info('"{}" is not a known statement'.format(input_statement.text))
|
|
||||||
|
|
||||||
return input_statement
|
|
@ -1,63 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import datetime
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Mailgun(InputAdapter):
|
|
||||||
"""
|
|
||||||
Get input from Mailgun.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Mailgun, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
# Use the bot's name for the name of the sender
|
|
||||||
self.name = kwargs.get('name')
|
|
||||||
self.from_address = kwargs.get('mailgun_from_address')
|
|
||||||
self.api_key = kwargs.get('mailgun_api_key')
|
|
||||||
self.endpoint = kwargs.get('mailgun_api_endpoint')
|
|
||||||
|
|
||||||
def get_email_stored_events(self):
|
|
||||||
import requests
|
|
||||||
|
|
||||||
yesterday = datetime.datetime.now() - datetime.timedelta(1)
|
|
||||||
return requests.get(
|
|
||||||
'{}/events'.format(self.endpoint),
|
|
||||||
auth=('api', self.api_key),
|
|
||||||
params={
|
|
||||||
'begin': yesterday.isoformat(),
|
|
||||||
'ascending': 'yes',
|
|
||||||
'limit': 1
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_stored_email_urls(self):
|
|
||||||
response = self.get_email_stored_events()
|
|
||||||
data = response.json()
|
|
||||||
|
|
||||||
for item in data.get('items', []):
|
|
||||||
if 'storage' in item:
|
|
||||||
if 'url' in item['storage']:
|
|
||||||
yield item['storage']['url']
|
|
||||||
|
|
||||||
def get_message(self, url):
|
|
||||||
import requests
|
|
||||||
|
|
||||||
return requests.get(
|
|
||||||
url,
|
|
||||||
auth=('api', self.api_key)
|
|
||||||
)
|
|
||||||
|
|
||||||
def process_input(self, statement):
|
|
||||||
urls = self.get_stored_email_urls()
|
|
||||||
url = list(urls)[0]
|
|
||||||
|
|
||||||
response = self.get_message(url)
|
|
||||||
message = response.json()
|
|
||||||
|
|
||||||
text = message.get('stripped-text')
|
|
||||||
|
|
||||||
return Statement(text)
|
|
@ -1,117 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from time import sleep
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Microsoft(InputAdapter):
|
|
||||||
"""
|
|
||||||
An input adapter that allows a ChatterBot instance to get
|
|
||||||
input statements from a Microsoft Bot using *Directline client protocol*.
|
|
||||||
https://docs.botframework.com/en-us/restapi/directline/#navtitle
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Microsoft, self).__init__(**kwargs)
|
|
||||||
import requests
|
|
||||||
from requests.packages.urllib3.exceptions import InsecureRequestWarning
|
|
||||||
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
|
|
||||||
|
|
||||||
self.directline_host = kwargs.get('directline_host', 'https://directline.botframework.com')
|
|
||||||
|
|
||||||
# NOTE: Direct Line client credentials are different from your bot's
|
|
||||||
# credentials
|
|
||||||
self.direct_line_token_or_secret = kwargs. \
|
|
||||||
get('direct_line_token_or_secret')
|
|
||||||
|
|
||||||
authorization_header = 'BotConnector {}'. \
|
|
||||||
format(self.direct_line_token_or_secret)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json',
|
|
||||||
'Accept': 'application/json',
|
|
||||||
'charset': 'utf-8'
|
|
||||||
}
|
|
||||||
|
|
||||||
conversation_data = self.start_conversation()
|
|
||||||
self.conversation_id = conversation_data.get('conversationId')
|
|
||||||
self.conversation_token = conversation_data.get('token')
|
|
||||||
|
|
||||||
def _validate_status_code(self, response):
|
|
||||||
code = response.status_code
|
|
||||||
if not code == 200:
|
|
||||||
raise self.HTTPStatusException('{} status code recieved'.
|
|
||||||
format(code))
|
|
||||||
|
|
||||||
def start_conversation(self):
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{host}/api/conversations'.format(host=self.directline_host)
|
|
||||||
response = requests.post(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
verify=False
|
|
||||||
)
|
|
||||||
self.logger.info('{} starting conversation {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def get_most_recent_message(self):
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{host}/api/conversations/{id}/messages' \
|
|
||||||
.format(host=self.directline_host,
|
|
||||||
id=self.conversation_id)
|
|
||||||
|
|
||||||
response = requests.get(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
verify=False
|
|
||||||
)
|
|
||||||
|
|
||||||
self.logger.info('{} retrieving most recent messages {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
|
|
||||||
self._validate_status_code(response)
|
|
||||||
|
|
||||||
data = response.json()
|
|
||||||
|
|
||||||
if data['messages']:
|
|
||||||
last_msg = int(data['watermark'])
|
|
||||||
return data['messages'][last_msg - 1]
|
|
||||||
return None
|
|
||||||
|
|
||||||
def process_input(self, statement):
|
|
||||||
new_message = False
|
|
||||||
data = None
|
|
||||||
while not new_message:
|
|
||||||
data = self.get_most_recent_message()
|
|
||||||
if data and data['id']:
|
|
||||||
new_message = True
|
|
||||||
else:
|
|
||||||
pass
|
|
||||||
sleep(3.5)
|
|
||||||
|
|
||||||
text = data['text']
|
|
||||||
statement = Statement(text)
|
|
||||||
self.logger.info('processing user statement {}'.format(statement))
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
class HTTPStatusException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when unexpected non-success HTTP
|
|
||||||
status codes are returned in a response.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,19 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
from ..utils import input_function
|
|
||||||
|
|
||||||
|
|
||||||
class TerminalAdapter(InputAdapter):
|
|
||||||
"""
|
|
||||||
A simple adapter that allows ChatterBot to
|
|
||||||
communicate through the terminal.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def process_input(self, *args, **kwargs):
|
|
||||||
"""
|
|
||||||
Read the user's input from the terminal.
|
|
||||||
"""
|
|
||||||
user_input = input_function()
|
|
||||||
return Statement(user_input)
|
|
@ -1,61 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import InputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class VariableInputTypeAdapter(InputAdapter):
|
|
||||||
JSON = 'json'
|
|
||||||
TEXT = 'text'
|
|
||||||
OBJECT = 'object'
|
|
||||||
VALID_FORMATS = (JSON, TEXT, OBJECT,)
|
|
||||||
|
|
||||||
def detect_type(self, statement):
|
|
||||||
|
|
||||||
string_types = str
|
|
||||||
|
|
||||||
if hasattr(statement, 'text'):
|
|
||||||
return self.OBJECT
|
|
||||||
if isinstance(statement, string_types):
|
|
||||||
return self.TEXT
|
|
||||||
if isinstance(statement, dict):
|
|
||||||
return self.JSON
|
|
||||||
|
|
||||||
input_type = type(statement)
|
|
||||||
|
|
||||||
raise self.UnrecognizedInputFormatException(
|
|
||||||
'The type {} is not recognized as a valid input type.'.format(
|
|
||||||
input_type
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
def process_input(self, statement):
|
|
||||||
input_type = self.detect_type(statement)
|
|
||||||
|
|
||||||
# Return the statement object without modification
|
|
||||||
if input_type == self.OBJECT:
|
|
||||||
return statement
|
|
||||||
|
|
||||||
# Convert the input string into a statement object
|
|
||||||
if input_type == self.TEXT:
|
|
||||||
return Statement(statement)
|
|
||||||
|
|
||||||
# Convert input dictionary into a statement object
|
|
||||||
if input_type == self.JSON:
|
|
||||||
input_json = dict(statement)
|
|
||||||
text = input_json['text']
|
|
||||||
del input_json['text']
|
|
||||||
|
|
||||||
return Statement(text, **input_json)
|
|
||||||
|
|
||||||
class UnrecognizedInputFormatException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when an input format is specified that is
|
|
||||||
not in the VariableInputTypeAdapter.VALID_FORMATS variable.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value='The input format was not recognized.'):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,19 +0,0 @@
|
|||||||
from .logic_adapter import LogicAdapter
|
|
||||||
from .best_match import BestMatch
|
|
||||||
from .low_confidence import LowConfidenceAdapter
|
|
||||||
from .mathematical_evaluation import MathematicalEvaluation
|
|
||||||
from .multi_adapter import MultiLogicAdapter
|
|
||||||
from .no_knowledge_adapter import NoKnowledgeAdapter
|
|
||||||
from .specific_response import SpecificResponseAdapter
|
|
||||||
from .time_adapter import TimeLogicAdapter
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'LogicAdapter',
|
|
||||||
'BestMatch',
|
|
||||||
'LowConfidenceAdapter',
|
|
||||||
'MathematicalEvaluation',
|
|
||||||
'MultiLogicAdapter',
|
|
||||||
'NoKnowledgeAdapter',
|
|
||||||
'SpecificResponseAdapter',
|
|
||||||
'TimeLogicAdapter',
|
|
||||||
)
|
|
@ -1,85 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class BestMatch(LogicAdapter):
|
|
||||||
"""
|
|
||||||
A logic adapter that returns a response based on known responses to
|
|
||||||
the closest matches to the input statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def get(self, input_statement):
|
|
||||||
"""
|
|
||||||
Takes a statement string and a list of statement strings.
|
|
||||||
Returns the closest matching statement from the list.
|
|
||||||
"""
|
|
||||||
statement_list = self.chatbot.storage.get_response_statements()
|
|
||||||
|
|
||||||
if not statement_list:
|
|
||||||
if self.chatbot.storage.count():
|
|
||||||
# Use a randomly picked statement
|
|
||||||
self.logger.info(
|
|
||||||
'No statements have known responses. ' +
|
|
||||||
'Choosing a random response to return.'
|
|
||||||
)
|
|
||||||
random_response = self.chatbot.storage.get_random()
|
|
||||||
random_response.confidence = 0
|
|
||||||
return random_response
|
|
||||||
else:
|
|
||||||
raise self.EmptyDatasetException()
|
|
||||||
|
|
||||||
closest_match = input_statement
|
|
||||||
closest_match.confidence = 0
|
|
||||||
|
|
||||||
# Find the closest matching known statement
|
|
||||||
for statement in statement_list:
|
|
||||||
confidence = self.compare_statements(input_statement, statement)
|
|
||||||
|
|
||||||
if confidence > closest_match.confidence:
|
|
||||||
statement.confidence = confidence
|
|
||||||
closest_match = statement
|
|
||||||
|
|
||||||
return closest_match
|
|
||||||
|
|
||||||
def can_process(self, statement):
|
|
||||||
"""
|
|
||||||
Check that the chatbot's storage adapter is available to the logic
|
|
||||||
adapter and there is at least one statement in the database.
|
|
||||||
"""
|
|
||||||
return self.chatbot.storage.count()
|
|
||||||
|
|
||||||
def process(self, input_statement):
|
|
||||||
|
|
||||||
# Select the closest match to the input statement
|
|
||||||
closest_match = self.get(input_statement)
|
|
||||||
self.logger.info('Using "{}" as a close match to "{}"'.format(
|
|
||||||
input_statement.text, closest_match.text
|
|
||||||
))
|
|
||||||
|
|
||||||
# Get all statements that are in response to the closest match
|
|
||||||
response_list = self.chatbot.storage.filter(
|
|
||||||
in_response_to__contains=closest_match.text
|
|
||||||
)
|
|
||||||
|
|
||||||
if response_list:
|
|
||||||
self.logger.info(
|
|
||||||
'Selecting response from {} optimal responses.'.format(
|
|
||||||
len(response_list)
|
|
||||||
)
|
|
||||||
)
|
|
||||||
response = self.select_response(input_statement, response_list)
|
|
||||||
response.confidence = closest_match.confidence
|
|
||||||
self.logger.info('Response selected. Using "{}"'.format(response.text))
|
|
||||||
else:
|
|
||||||
response = self.chatbot.storage.get_random()
|
|
||||||
self.logger.info(
|
|
||||||
'No response to "{}" found. Selecting a random response.'.format(
|
|
||||||
closest_match.text
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
# Set confidence to zero because a random response is selected
|
|
||||||
response.confidence = 0
|
|
||||||
|
|
||||||
return response
|
|
@ -1,101 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..adapters import Adapter
|
|
||||||
from ..utils import import_module
|
|
||||||
|
|
||||||
|
|
||||||
class LogicAdapter(Adapter):
|
|
||||||
"""
|
|
||||||
This is an abstract class that represents the interface
|
|
||||||
that all logic adapters should implement.
|
|
||||||
|
|
||||||
:param statement_comparison_function: The dot-notated import path to a statement comparison function.
|
|
||||||
Defaults to ``levenshtein_distance``.
|
|
||||||
|
|
||||||
:param response_selection_method: The a response selection method.
|
|
||||||
Defaults to ``get_first_response``.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(LogicAdapter, self).__init__(**kwargs)
|
|
||||||
from ..comparisons import levenshtein_distance
|
|
||||||
from ..response_selection import get_first_response
|
|
||||||
|
|
||||||
# Import string module parameters
|
|
||||||
if 'statement_comparison_function' in kwargs:
|
|
||||||
import_path = kwargs.get('statement_comparison_function')
|
|
||||||
if isinstance(import_path, str):
|
|
||||||
kwargs['statement_comparison_function'] = import_module(import_path)
|
|
||||||
|
|
||||||
if 'response_selection_method' in kwargs:
|
|
||||||
import_path = kwargs.get('response_selection_method')
|
|
||||||
if isinstance(import_path, str):
|
|
||||||
kwargs['response_selection_method'] = import_module(import_path)
|
|
||||||
|
|
||||||
# By default, compare statements using Levenshtein distance
|
|
||||||
self.compare_statements = kwargs.get(
|
|
||||||
'statement_comparison_function',
|
|
||||||
levenshtein_distance
|
|
||||||
)
|
|
||||||
|
|
||||||
# By default, select the first available response
|
|
||||||
self.select_response = kwargs.get(
|
|
||||||
'response_selection_method',
|
|
||||||
get_first_response
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_initialization_functions(self):
|
|
||||||
"""
|
|
||||||
Return a dictionary of functions to be run once when the chat bot is instantiated.
|
|
||||||
"""
|
|
||||||
return self.compare_statements.get_initialization_functions()
|
|
||||||
|
|
||||||
def initialize(self):
|
|
||||||
for function in self.get_initialization_functions().values():
|
|
||||||
function()
|
|
||||||
|
|
||||||
def can_process(self, statement):
|
|
||||||
"""
|
|
||||||
A preliminary check that is called to determine if a
|
|
||||||
logic adapter can process a given statement. By default,
|
|
||||||
this method returns true but it can be overridden in
|
|
||||||
child classes as needed.
|
|
||||||
|
|
||||||
:rtype: bool
|
|
||||||
"""
|
|
||||||
return True
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
"""
|
|
||||||
Override this method and implement your logic for selecting a response to an input statement.
|
|
||||||
|
|
||||||
A confidence value and the selected response statement should be returned.
|
|
||||||
The confidence value represents a rating of how accurate the logic adapter
|
|
||||||
expects the selected response to be. Confidence scores are used to select
|
|
||||||
the best response from multiple logic adapters.
|
|
||||||
|
|
||||||
The confidence value should be a number between 0 and 1 where 0 is the
|
|
||||||
lowest confidence level and 1 is the highest.
|
|
||||||
|
|
||||||
:param statement: An input statement to be processed by the logic adapter.
|
|
||||||
:type statement: Statement
|
|
||||||
|
|
||||||
:rtype: Statement
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError()
|
|
||||||
|
|
||||||
@property
|
|
||||||
def class_name(self):
|
|
||||||
"""
|
|
||||||
Return the name of the current logic adapter class.
|
|
||||||
This is typically used for logging and debugging.
|
|
||||||
"""
|
|
||||||
return str(self.__class__.__name__)
|
|
||||||
|
|
||||||
class EmptyDatasetException(Exception):
|
|
||||||
|
|
||||||
def __init__(self, value='An empty set was received when at least one statement was expected.'):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,59 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import BestMatch
|
|
||||||
|
|
||||||
|
|
||||||
class LowConfidenceAdapter(BestMatch):
|
|
||||||
"""
|
|
||||||
Returns a default response with a high confidence
|
|
||||||
when a high confidence response is not known.
|
|
||||||
|
|
||||||
:kwargs:
|
|
||||||
* *threshold* (``float``) --
|
|
||||||
The low confidence value that triggers this adapter.
|
|
||||||
Defaults to 0.65.
|
|
||||||
* *default_response* (``str``) or (``iterable``)--
|
|
||||||
The response returned by this logic adaper.
|
|
||||||
* *response_selection_method* (``str``) or (``callable``)
|
|
||||||
The a response selection method.
|
|
||||||
Defaults to ``get_first_response``.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(LowConfidenceAdapter, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.confidence_threshold = kwargs.get('threshold', 0.65)
|
|
||||||
|
|
||||||
default_responses = kwargs.get(
|
|
||||||
'default_response', "I'm sorry, I do not understand."
|
|
||||||
)
|
|
||||||
|
|
||||||
# Convert a single string into a list
|
|
||||||
if isinstance(default_responses, str):
|
|
||||||
default_responses = [
|
|
||||||
default_responses
|
|
||||||
]
|
|
||||||
|
|
||||||
self.default_responses = [
|
|
||||||
Statement(text=default) for default in default_responses
|
|
||||||
]
|
|
||||||
|
|
||||||
def process(self, input_statement):
|
|
||||||
"""
|
|
||||||
Return a default response with a high confidence if
|
|
||||||
a high confidence response is not known.
|
|
||||||
"""
|
|
||||||
# Select the closest match to the input statement
|
|
||||||
closest_match = self.get(input_statement)
|
|
||||||
|
|
||||||
# Choose a response from the list of options
|
|
||||||
response = self.select_response(input_statement, self.default_responses)
|
|
||||||
|
|
||||||
# Confidence should be high only if it is less than the threshold
|
|
||||||
if closest_match.confidence < self.confidence_threshold:
|
|
||||||
response.confidence = 1
|
|
||||||
else:
|
|
||||||
response.confidence = 0
|
|
||||||
|
|
||||||
return response
|
|
@ -1,68 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from ..conversation import Statement
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class MathematicalEvaluation(LogicAdapter):
|
|
||||||
"""
|
|
||||||
The MathematicalEvaluation logic adapter parses input to determine
|
|
||||||
whether the user is asking a question that requires math to be done.
|
|
||||||
If so, the equation is extracted from the input and returned with
|
|
||||||
the evaluated result.
|
|
||||||
|
|
||||||
For example:
|
|
||||||
User: 'What is three plus five?'
|
|
||||||
Bot: 'Three plus five equals eight'
|
|
||||||
|
|
||||||
:kwargs:
|
|
||||||
* *language* (``str``) --
|
|
||||||
The language is set to 'ENG' for English by default.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(MathematicalEvaluation, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.language = kwargs.get('language', 'ENG')
|
|
||||||
self.cache = {}
|
|
||||||
|
|
||||||
def can_process(self, statement):
|
|
||||||
"""
|
|
||||||
Determines whether it is appropriate for this
|
|
||||||
adapter to respond to the user input.
|
|
||||||
"""
|
|
||||||
response = self.process(statement)
|
|
||||||
self.cache[statement.text] = response
|
|
||||||
return response.confidence == 1
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
"""
|
|
||||||
Takes a statement string.
|
|
||||||
Returns the equation from the statement with the mathematical terms solved.
|
|
||||||
"""
|
|
||||||
from mathparse import mathparse
|
|
||||||
|
|
||||||
input_text = statement.text
|
|
||||||
|
|
||||||
# Use the result cached by the process method if it exists
|
|
||||||
if input_text in self.cache:
|
|
||||||
cached_result = self.cache[input_text]
|
|
||||||
self.cache = {}
|
|
||||||
return cached_result
|
|
||||||
|
|
||||||
# Getting the mathematical terms within the input statement
|
|
||||||
expression = mathparse.extract_expression(input_text, language=self.language)
|
|
||||||
|
|
||||||
response = Statement(text=expression)
|
|
||||||
|
|
||||||
try:
|
|
||||||
response.text += ' = ' + str(
|
|
||||||
mathparse.parse(expression, language=self.language)
|
|
||||||
)
|
|
||||||
|
|
||||||
# The confidence is 1 if the expression could be evaluated
|
|
||||||
response.confidence = 1
|
|
||||||
except mathparse.PostfixTokenEvaluationException:
|
|
||||||
response.confidence = 0
|
|
||||||
|
|
||||||
return response
|
|
@ -1,155 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from collections import Counter
|
|
||||||
|
|
||||||
from .. import utils
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class MultiLogicAdapter(LogicAdapter):
|
|
||||||
"""
|
|
||||||
MultiLogicAdapter allows ChatterBot to use multiple logic
|
|
||||||
adapters. It has methods that allow ChatterBot to add an
|
|
||||||
adapter, set the chat bot, and process an input statement
|
|
||||||
to get a response.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(MultiLogicAdapter, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
# Logic adapters added by the chat bot
|
|
||||||
self.adapters = []
|
|
||||||
|
|
||||||
# Required logic adapters that must always be present
|
|
||||||
self.system_adapters = []
|
|
||||||
|
|
||||||
def get_initialization_functions(self):
|
|
||||||
"""
|
|
||||||
Get the initialization functions for each logic adapter.
|
|
||||||
"""
|
|
||||||
functions_dict = {}
|
|
||||||
|
|
||||||
# Iterate over each adapter and get its initialization functions
|
|
||||||
for logic_adapter in self.get_adapters():
|
|
||||||
functions = logic_adapter.get_initialization_functions()
|
|
||||||
functions_dict.update(functions)
|
|
||||||
|
|
||||||
return functions_dict
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
"""
|
|
||||||
Returns the output of a selection of logic adapters
|
|
||||||
for a given input statement.
|
|
||||||
|
|
||||||
:param statement: The input statement to be processed.
|
|
||||||
"""
|
|
||||||
results = []
|
|
||||||
result = None
|
|
||||||
max_confidence = -1
|
|
||||||
|
|
||||||
for adapter in self.get_adapters():
|
|
||||||
if adapter.can_process(statement):
|
|
||||||
|
|
||||||
output = adapter.process(statement)
|
|
||||||
results.append((output.confidence, output,))
|
|
||||||
|
|
||||||
self.logger.info(
|
|
||||||
'{} selected "{}" as a response with a confidence of {}'.format(
|
|
||||||
adapter.class_name, output.text, output.confidence
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
if output.confidence > max_confidence:
|
|
||||||
result = output
|
|
||||||
max_confidence = output.confidence
|
|
||||||
else:
|
|
||||||
self.logger.info(
|
|
||||||
'Not processing the statement using {}'.format(adapter.class_name)
|
|
||||||
)
|
|
||||||
|
|
||||||
# If multiple adapters agree on the same statement,
|
|
||||||
# then that statement is more likely to be the correct response
|
|
||||||
if len(results) >= 3:
|
|
||||||
statements = [s[1] for s in results]
|
|
||||||
count = Counter(statements)
|
|
||||||
most_common = count.most_common()
|
|
||||||
if most_common[0][1] > 1:
|
|
||||||
result = most_common[0][0]
|
|
||||||
max_confidence = self.get_greatest_confidence(result, results)
|
|
||||||
|
|
||||||
result.confidence = max_confidence
|
|
||||||
return result
|
|
||||||
|
|
||||||
def get_greatest_confidence(self, statement, options):
|
|
||||||
"""
|
|
||||||
Returns the greatest confidence value for a statement that occurs
|
|
||||||
multiple times in the set of options.
|
|
||||||
|
|
||||||
:param statement: A statement object.
|
|
||||||
:param options: A tuple in the format of (confidence, statement).
|
|
||||||
"""
|
|
||||||
values = []
|
|
||||||
for option in options:
|
|
||||||
if option[1] == statement:
|
|
||||||
values.append(option[0])
|
|
||||||
|
|
||||||
return max(values)
|
|
||||||
|
|
||||||
def get_adapters(self):
|
|
||||||
"""
|
|
||||||
Return a list of all logic adapters being used, including system logic adapters.
|
|
||||||
"""
|
|
||||||
adapters = []
|
|
||||||
adapters.extend(self.adapters)
|
|
||||||
adapters.extend(self.system_adapters)
|
|
||||||
return adapters
|
|
||||||
|
|
||||||
def add_adapter(self, adapter, **kwargs):
|
|
||||||
"""
|
|
||||||
Appends a logic adapter to the list of logic adapters being used.
|
|
||||||
|
|
||||||
:param adapter: The logic adapter to be added.
|
|
||||||
:type adapter: `LogicAdapter`
|
|
||||||
"""
|
|
||||||
utils.validate_adapter_class(adapter, LogicAdapter)
|
|
||||||
adapter = utils.initialize_class(adapter, **kwargs)
|
|
||||||
self.adapters.append(adapter)
|
|
||||||
|
|
||||||
def insert_logic_adapter(self, logic_adapter, insert_index, **kwargs):
|
|
||||||
"""
|
|
||||||
Adds a logic adapter at a specified index.
|
|
||||||
|
|
||||||
:param logic_adapter: The string path to the logic adapter to add.
|
|
||||||
:type logic_adapter: str
|
|
||||||
|
|
||||||
:param insert_index: The index to insert the logic adapter into the list at.
|
|
||||||
:type insert_index: int
|
|
||||||
"""
|
|
||||||
utils.validate_adapter_class(logic_adapter, LogicAdapter)
|
|
||||||
|
|
||||||
NewAdapter = utils.import_module(logic_adapter)
|
|
||||||
adapter = NewAdapter(**kwargs)
|
|
||||||
|
|
||||||
self.adapters.insert(insert_index, adapter)
|
|
||||||
|
|
||||||
def remove_logic_adapter(self, adapter_name):
|
|
||||||
"""
|
|
||||||
Removes a logic adapter from the chat bot.
|
|
||||||
|
|
||||||
:param adapter_name: The class name of the adapter to remove.
|
|
||||||
:type adapter_name: str
|
|
||||||
"""
|
|
||||||
for index, adapter in enumerate(self.adapters):
|
|
||||||
if adapter_name == type(adapter).__name__:
|
|
||||||
del self.adapters[index]
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
def set_chatbot(self, chatbot):
|
|
||||||
"""
|
|
||||||
Set the chatbot for each of the contained logic adapters.
|
|
||||||
"""
|
|
||||||
super(MultiLogicAdapter, self).set_chatbot(chatbot)
|
|
||||||
|
|
||||||
for adapter in self.get_adapters():
|
|
||||||
adapter.set_chatbot(chatbot)
|
|
@ -1,27 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class NoKnowledgeAdapter(LogicAdapter):
|
|
||||||
"""
|
|
||||||
This is a system adapter that is automatically added
|
|
||||||
to the list of logic adapters during initialization.
|
|
||||||
This adapter is placed at the beginning of the list
|
|
||||||
to be given the highest priority.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
"""
|
|
||||||
If there are no known responses in the database,
|
|
||||||
then a confidence of 1 should be returned with
|
|
||||||
the input statement.
|
|
||||||
Otherwise, a confidence of 0 should be returned.
|
|
||||||
"""
|
|
||||||
|
|
||||||
if self.chatbot.storage.count():
|
|
||||||
statement.confidence = 0
|
|
||||||
else:
|
|
||||||
statement.confidence = 1
|
|
||||||
|
|
||||||
return statement
|
|
@ -1,39 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class SpecificResponseAdapter(LogicAdapter):
|
|
||||||
"""
|
|
||||||
Return a specific response to a specific input.
|
|
||||||
|
|
||||||
:kwargs:
|
|
||||||
* *input_text* (``str``) --
|
|
||||||
The input text that triggers this logic adapter.
|
|
||||||
* *output_text* (``str``) --
|
|
||||||
The output text returned by this logic adapter.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(SpecificResponseAdapter, self).__init__(**kwargs)
|
|
||||||
from ..conversation import Statement
|
|
||||||
|
|
||||||
self.input_text = kwargs.get('input_text')
|
|
||||||
|
|
||||||
output_text = kwargs.get('output_text')
|
|
||||||
self.response_statement = Statement(output_text)
|
|
||||||
|
|
||||||
def can_process(self, statement):
|
|
||||||
if statement == self.input_text:
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
|
|
||||||
if statement == self.input_text:
|
|
||||||
self.response_statement.confidence = 1
|
|
||||||
else:
|
|
||||||
self.response_statement.confidence = 0
|
|
||||||
|
|
||||||
return self.response_statement
|
|
@ -1,93 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from datetime import datetime
|
|
||||||
|
|
||||||
from . import LogicAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class TimeLogicAdapter(LogicAdapter):
|
|
||||||
"""
|
|
||||||
The TimeLogicAdapter returns the current time.
|
|
||||||
|
|
||||||
:kwargs:
|
|
||||||
* *positive* (``list``) --
|
|
||||||
The time-related questions used to identify time questions.
|
|
||||||
Defaults to a list of English sentences.
|
|
||||||
* *negative* (``list``) --
|
|
||||||
The non-time-related questions used to identify time questions.
|
|
||||||
Defaults to a list of English sentences.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(TimeLogicAdapter, self).__init__(**kwargs)
|
|
||||||
from nltk import NaiveBayesClassifier
|
|
||||||
|
|
||||||
self.positive = kwargs.get('positive', [
|
|
||||||
'what time is it',
|
|
||||||
'hey what time is it',
|
|
||||||
'do you have the time',
|
|
||||||
'do you know the time',
|
|
||||||
'do you know what time it is',
|
|
||||||
'what is the time'
|
|
||||||
])
|
|
||||||
|
|
||||||
self.negative = kwargs.get('negative', [
|
|
||||||
'it is time to go to sleep',
|
|
||||||
'what is your favorite color',
|
|
||||||
'i had a great time',
|
|
||||||
'thyme is my favorite herb',
|
|
||||||
'do you have time to look at my essay',
|
|
||||||
'how do you have the time to do all this'
|
|
||||||
'what is it'
|
|
||||||
])
|
|
||||||
|
|
||||||
labeled_data = (
|
|
||||||
[(name, 0) for name in self.negative] +
|
|
||||||
[(name, 1) for name in self.positive]
|
|
||||||
)
|
|
||||||
|
|
||||||
train_set = [
|
|
||||||
(self.time_question_features(text), n) for (text, n) in labeled_data
|
|
||||||
]
|
|
||||||
|
|
||||||
self.classifier = NaiveBayesClassifier.train(train_set)
|
|
||||||
|
|
||||||
def time_question_features(self, text):
|
|
||||||
"""
|
|
||||||
Provide an analysis of significant features in the string.
|
|
||||||
"""
|
|
||||||
features = {}
|
|
||||||
|
|
||||||
# A list of all words from the known sentences
|
|
||||||
all_words = " ".join(self.positive + self.negative).split()
|
|
||||||
|
|
||||||
# A list of the first word in each of the known sentence
|
|
||||||
all_first_words = []
|
|
||||||
for sentence in self.positive + self.negative:
|
|
||||||
all_first_words.append(
|
|
||||||
sentence.split(' ', 1)[0]
|
|
||||||
)
|
|
||||||
|
|
||||||
for word in text.split():
|
|
||||||
features['first_word({})'.format(word)] = (word in all_first_words)
|
|
||||||
|
|
||||||
for word in text.split():
|
|
||||||
features['contains({})'.format(word)] = (word in all_words)
|
|
||||||
|
|
||||||
for letter in 'abcdefghijklmnopqrstuvwxyz':
|
|
||||||
features['count({})'.format(letter)] = text.lower().count(letter)
|
|
||||||
features['has({})'.format(letter)] = (letter in text.lower())
|
|
||||||
|
|
||||||
return features
|
|
||||||
|
|
||||||
def process(self, statement):
|
|
||||||
from ..conversation import Statement
|
|
||||||
|
|
||||||
now = datetime.now()
|
|
||||||
|
|
||||||
time_features = self.time_question_features(statement.text.lower())
|
|
||||||
confidence = self.classifier.classify(time_features)
|
|
||||||
response = Statement('The current time is ' + now.strftime('%I:%M %p'))
|
|
||||||
|
|
||||||
response.confidence = confidence
|
|
||||||
return response
|
|
@ -1,15 +0,0 @@
|
|||||||
from .output_adapter import OutputAdapter
|
|
||||||
from .gitter import Gitter
|
|
||||||
from .hipchat import HipChat
|
|
||||||
from .mailgun import Mailgun
|
|
||||||
from .microsoft import Microsoft
|
|
||||||
from .terminal import TerminalAdapter
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'OutputAdapter',
|
|
||||||
'Microsoft',
|
|
||||||
'TerminalAdapter',
|
|
||||||
'Mailgun',
|
|
||||||
'Gitter',
|
|
||||||
'HipChat',
|
|
||||||
)
|
|
@ -1,86 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import OutputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Gitter(OutputAdapter):
|
|
||||||
"""
|
|
||||||
An output adapter that allows a ChatterBot instance to send
|
|
||||||
responses to a Gitter room.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Gitter, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.gitter_host = kwargs.get('gitter_host', 'https://api.gitter.im/v1/')
|
|
||||||
self.gitter_room = kwargs.get('gitter_room')
|
|
||||||
self.gitter_api_token = kwargs.get('gitter_api_token')
|
|
||||||
|
|
||||||
authorization_header = 'Bearer {}'.format(self.gitter_api_token)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json; charset=utf-8',
|
|
||||||
'Accept': 'application/json'
|
|
||||||
}
|
|
||||||
|
|
||||||
# Join the Gitter room
|
|
||||||
room_data = self.join_room(self.gitter_room)
|
|
||||||
self.room_id = room_data.get('id')
|
|
||||||
|
|
||||||
def _validate_status_code(self, response):
|
|
||||||
code = response.status_code
|
|
||||||
if code not in [200, 201]:
|
|
||||||
raise self.HTTPStatusException('{} status code recieved'.format(code))
|
|
||||||
|
|
||||||
def join_room(self, room_name):
|
|
||||||
"""
|
|
||||||
Join the specified Gitter room.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}rooms'.format(self.gitter_host)
|
|
||||||
response = requests.post(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
json={'uri': room_name}
|
|
||||||
)
|
|
||||||
self.logger.info('{} status joining room {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def send_message(self, text):
|
|
||||||
"""
|
|
||||||
Send a message to a Gitter room.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
endpoint = '{}rooms/{}/chatMessages'.format(self.gitter_host, self.room_id)
|
|
||||||
response = requests.post(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
json={'text': text}
|
|
||||||
)
|
|
||||||
self.logger.info('{} sending message to {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
self.send_message(statement.text)
|
|
||||||
return statement
|
|
||||||
|
|
||||||
class HTTPStatusException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when unexpected non-success HTTP
|
|
||||||
status codes are returned in a response.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,69 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import json
|
|
||||||
|
|
||||||
from . import OutputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class HipChat(OutputAdapter):
|
|
||||||
"""
|
|
||||||
An output adapter that allows a ChatterBot instance to send
|
|
||||||
responses to a HipChat room.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(HipChat, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.hipchat_host = kwargs.get("hipchat_host")
|
|
||||||
self.hipchat_access_token = kwargs.get("hipchat_access_token")
|
|
||||||
self.hipchat_room = kwargs.get("hipchat_room")
|
|
||||||
|
|
||||||
authorization_header = "Bearer {}".format(self.hipchat_access_token)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json'
|
|
||||||
}
|
|
||||||
|
|
||||||
import requests
|
|
||||||
self.session = requests.Session()
|
|
||||||
self.session.verify = kwargs.get('ssl_verify', True)
|
|
||||||
|
|
||||||
def send_message(self, room_id_or_name, message):
|
|
||||||
"""
|
|
||||||
Send a message to a HipChat room.
|
|
||||||
https://www.hipchat.com/docs/apiv2/method/send_message
|
|
||||||
"""
|
|
||||||
message_url = "{}/v2/room/{}/message".format(
|
|
||||||
self.hipchat_host,
|
|
||||||
room_id_or_name
|
|
||||||
)
|
|
||||||
|
|
||||||
response = self.session.post(
|
|
||||||
message_url,
|
|
||||||
headers=self.headers,
|
|
||||||
data=json.dumps({
|
|
||||||
'message': message
|
|
||||||
})
|
|
||||||
)
|
|
||||||
|
|
||||||
return response.json()
|
|
||||||
|
|
||||||
def reply_to_message(self):
|
|
||||||
"""
|
|
||||||
The HipChat api supports responding to a given message.
|
|
||||||
This may be a good feature to implement in the future to
|
|
||||||
help with multi-user conversations.
|
|
||||||
https://www.hipchat.com/docs/apiv2/method/reply_to_message
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError()
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
data = self.send_message(self.hipchat_room, statement.text)
|
|
||||||
|
|
||||||
# Update the output statement with the message id
|
|
||||||
self.chatbot.storage.update(
|
|
||||||
statement.add_extra_data('hipchat_message_id', data['id'])
|
|
||||||
)
|
|
||||||
|
|
||||||
return statement
|
|
@ -1,50 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import OutputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Mailgun(OutputAdapter):
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Mailgun, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
# Use the bot's name for the name of the sender
|
|
||||||
self.name = kwargs.get('name')
|
|
||||||
self.from_address = kwargs.get('mailgun_from_address')
|
|
||||||
self.api_key = kwargs.get('mailgun_api_key')
|
|
||||||
self.endpoint = kwargs.get('mailgun_api_endpoint')
|
|
||||||
self.recipients = kwargs.get('mailgun_recipients')
|
|
||||||
|
|
||||||
def send_message(self, subject, text, from_address, recipients):
|
|
||||||
"""
|
|
||||||
* subject: Subject of the email.
|
|
||||||
* text: Text body of the email.
|
|
||||||
* from_email: The email address that the message will be sent from.
|
|
||||||
* recipients: A list of recipient email addresses.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
return requests.post(
|
|
||||||
self.endpoint,
|
|
||||||
auth=('api', self.api_key),
|
|
||||||
data={
|
|
||||||
'from': '%s <%s>' % (self.name, from_address),
|
|
||||||
'to': recipients,
|
|
||||||
'subject': subject,
|
|
||||||
'text': text
|
|
||||||
})
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
"""
|
|
||||||
Send the response statement as an email.
|
|
||||||
"""
|
|
||||||
subject = 'Message from %s' % (self.name)
|
|
||||||
|
|
||||||
self.send_message(
|
|
||||||
subject,
|
|
||||||
statement.text,
|
|
||||||
self.from_address,
|
|
||||||
self.recipients
|
|
||||||
)
|
|
||||||
|
|
||||||
return statement
|
|
@ -1,111 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import json
|
|
||||||
|
|
||||||
from . import OutputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Microsoft(OutputAdapter):
|
|
||||||
"""
|
|
||||||
An output adapter that allows a ChatterBot instance to send
|
|
||||||
responses to a Microsoft bot using *Direct Line client protocol*.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(Microsoft, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
self.directline_host = kwargs.get(
|
|
||||||
'directline_host',
|
|
||||||
'https://directline.botframework.com'
|
|
||||||
)
|
|
||||||
self.direct_line_token_or_secret = kwargs.get(
|
|
||||||
'direct_line_token_or_secret'
|
|
||||||
)
|
|
||||||
self.conversation_id = kwargs.get('conversation_id')
|
|
||||||
|
|
||||||
authorization_header = 'BotConnector {}'.format(
|
|
||||||
self.direct_line_token_or_secret
|
|
||||||
)
|
|
||||||
|
|
||||||
self.headers = {
|
|
||||||
'Authorization': authorization_header,
|
|
||||||
'Content-Type': 'application/json'
|
|
||||||
}
|
|
||||||
|
|
||||||
def _validate_status_code(self, response):
|
|
||||||
status_code = response.status_code
|
|
||||||
if status_code not in [200, 204]:
|
|
||||||
raise self.HTTPStatusException('{} status code recieved'.format(status_code))
|
|
||||||
|
|
||||||
def get_most_recent_message(self):
|
|
||||||
"""
|
|
||||||
Return the most recently sent message.
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
endpoint = '{host}/api/conversations/{id}/messages'.format(
|
|
||||||
host=self.directline_host,
|
|
||||||
id=self.conversation_id
|
|
||||||
)
|
|
||||||
|
|
||||||
response = requests.get(
|
|
||||||
endpoint,
|
|
||||||
headers=self.headers,
|
|
||||||
verify=False
|
|
||||||
)
|
|
||||||
|
|
||||||
self.logger.info('{} retrieving most recent messages {}'.format(
|
|
||||||
response.status_code, endpoint
|
|
||||||
))
|
|
||||||
|
|
||||||
self._validate_status_code(response)
|
|
||||||
|
|
||||||
data = response.json()
|
|
||||||
|
|
||||||
if data['messages']:
|
|
||||||
last_msg = int(data['watermark'])
|
|
||||||
return data['messages'][last_msg - 1]
|
|
||||||
return None
|
|
||||||
|
|
||||||
def send_message(self, conversation_id, message):
|
|
||||||
"""
|
|
||||||
Send a message to a HipChat room.
|
|
||||||
https://www.hipchat.com/docs/apiv2/method/send_message
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
message_url = "{host}/api/conversations/{conversationId}/messages".format(
|
|
||||||
host=self.directline_host,
|
|
||||||
conversationId=conversation_id
|
|
||||||
)
|
|
||||||
|
|
||||||
response = requests.post(
|
|
||||||
message_url,
|
|
||||||
headers=self.headers,
|
|
||||||
data=json.dumps({
|
|
||||||
'message': message
|
|
||||||
})
|
|
||||||
)
|
|
||||||
|
|
||||||
self.logger.info('{} sending message {}'.format(
|
|
||||||
response.status_code, message_url
|
|
||||||
))
|
|
||||||
self._validate_status_code(response)
|
|
||||||
# Microsoft return 204 on operation succeeded and no content was returned.
|
|
||||||
return self.get_most_recent_message()
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
data = self.send_message(self.conversation_id, statement.text)
|
|
||||||
self.logger.info('processing user response {}'.format(data))
|
|
||||||
return statement
|
|
||||||
|
|
||||||
class HTTPStatusException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when unexpected non-success HTTP
|
|
||||||
status codes are returned in a response.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
@ -1,20 +0,0 @@
|
|||||||
from ..adapters import Adapter
|
|
||||||
|
|
||||||
|
|
||||||
class OutputAdapter(Adapter):
|
|
||||||
"""
|
|
||||||
A generic class that can be overridden by a subclass to provide extended
|
|
||||||
functionality, such as delivering a response to an API endpoint.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
"""
|
|
||||||
Override this method in a subclass to implement customized functionality.
|
|
||||||
|
|
||||||
:param statement: The statement that the chat bot has produced in response to some input.
|
|
||||||
|
|
||||||
:param session_id: The unique id of the current chat session.
|
|
||||||
|
|
||||||
:returns: The response statement.
|
|
||||||
"""
|
|
||||||
return statement
|
|
@ -1,17 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
from . import OutputAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class TerminalAdapter(OutputAdapter):
|
|
||||||
"""
|
|
||||||
A simple adapter that allows ChatterBot to
|
|
||||||
communicate through the terminal.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def process_response(self, statement, session_id=None):
|
|
||||||
"""
|
|
||||||
Print the response to the user's input.
|
|
||||||
"""
|
|
||||||
print(statement.text)
|
|
||||||
return statement.text
|
|
@ -1,752 +0,0 @@
|
|||||||
# -*- coding: utf-8 -*-
|
|
||||||
import calendar
|
|
||||||
import re
|
|
||||||
from datetime import timedelta, datetime
|
|
||||||
|
|
||||||
# Variations of dates that the parser can capture
|
|
||||||
year_variations = ['year', 'years', 'yrs']
|
|
||||||
day_variations = ['days', 'day']
|
|
||||||
minute_variations = ['minute', 'minutes', 'mins']
|
|
||||||
hour_variations = ['hrs', 'hours', 'hour']
|
|
||||||
week_variations = ['weeks', 'week', 'wks']
|
|
||||||
month_variations = ['month', 'months']
|
|
||||||
|
|
||||||
# Variables used for RegEx Matching
|
|
||||||
day_names = 'monday|tuesday|wednesday|thursday|friday|saturday|sunday'
|
|
||||||
month_names_long = (
|
|
||||||
'january|february|march|april|may|june|july|august|september|october|november|december'
|
|
||||||
)
|
|
||||||
month_names = month_names_long + '|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec'
|
|
||||||
day_nearest_names = 'today|yesterday|tomorrow|tonight|tonite'
|
|
||||||
numbers = (
|
|
||||||
'(^a(?=\s)|one|two|three|four|five|six|seven|eight|nine|ten|'
|
|
||||||
'eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|'
|
|
||||||
'eighteen|nineteen|twenty|thirty|forty|fifty|sixty|seventy|'
|
|
||||||
'eighty|ninety|hundred|thousand)'
|
|
||||||
)
|
|
||||||
re_dmy = '(' + '|'.join(day_variations + minute_variations + year_variations + week_variations + month_variations) + ')'
|
|
||||||
re_duration = '(before|after|earlier|later|ago|from\snow)'
|
|
||||||
re_year = '(19|20)\d{2}|^(19|20)\d{2}'
|
|
||||||
re_timeframe = 'this|coming|next|following|previous|last|end\sof\sthe'
|
|
||||||
re_ordinal = 'st|nd|rd|th|first|second|third|fourth|fourth|' + re_timeframe
|
|
||||||
re_time = r'(?P<hour>\d{1,2})(\:(?P<minute>\d{1,2})|(?P<convention>am|pm))'
|
|
||||||
re_separator = 'of|at|on'
|
|
||||||
|
|
||||||
# A list tuple of regular expressions / parser fn to match
|
|
||||||
# Start with the widest match and narrow it down because the order of the match in this list matters
|
|
||||||
regex = [
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
((?P<dow>%s)[,\s]\s*)? #Matches Monday, 12 Jan 2012, 12 Jan 2012 etc
|
|
||||||
(?P<day>\d{1,2}) # Matches a digit
|
|
||||||
(%s)?
|
|
||||||
[-\s] # One or more space
|
|
||||||
(?P<month>%s) # Matches any month name
|
|
||||||
[-\s] # Space
|
|
||||||
(?P<year>%s) # Year
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
)
|
|
||||||
''' % (day_names, re_ordinal, month_names, re_year, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
int(m.group('day') if m.group('day') else 1),
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
((?P<dow>%s)[,\s][-\s]*)? #Matches Monday, Jan 12 2012, Jan 12 2012 etc
|
|
||||||
(?P<month>%s) # Matches any month name
|
|
||||||
[-\s] # Space
|
|
||||||
((?P<day>\d{1,2})) # Matches a digit
|
|
||||||
(%s)?
|
|
||||||
([-\s](?P<year>%s))? # Year
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
)
|
|
||||||
''' % (day_names, month_names, re_ordinal, re_year, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
int(m.group('day') if m.group('day') else 1)
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<month>%s) # Matches any month name
|
|
||||||
[-\s] # One or more space
|
|
||||||
(?P<day>\d{1,2}) # Matches a digit
|
|
||||||
(%s)?
|
|
||||||
[-\s]\s*?
|
|
||||||
(?P<year>%s) # Year
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
)
|
|
||||||
''' % (month_names, re_ordinal, re_year, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
int(m.group('day') if m.group('day') else 1),
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
((?P<number>\d+|(%s[-\s]?)+)\s)? # Matches any number or string 25 or twenty five
|
|
||||||
(?P<unit>%s)s?\s # Matches days, months, years, weeks, minutes
|
|
||||||
(?P<duration>%s) # before, after, earlier, later, ago, from now
|
|
||||||
(\s*(?P<base_time>(%s)))?
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
)
|
|
||||||
''' % (numbers, re_dmy, re_duration, day_nearest_names, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: date_from_duration(
|
|
||||||
base_date,
|
|
||||||
m.group('number'),
|
|
||||||
m.group('unit').lower(),
|
|
||||||
m.group('duration').lower(),
|
|
||||||
m.group('base_time')
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<ordinal>%s) # First quarter of 2014
|
|
||||||
\s+
|
|
||||||
quarter\sof
|
|
||||||
\s+
|
|
||||||
(?P<year>%s)
|
|
||||||
)
|
|
||||||
''' % (re_ordinal, re_year),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: date_from_quarter(
|
|
||||||
base_date,
|
|
||||||
HASHORDINALS[m.group('ordinal').lower()],
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year)
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<ordinal_value>\d+)
|
|
||||||
(?P<ordinal>%s) # 1st January 2012
|
|
||||||
((\s|,\s|\s(%s))?\s*)?
|
|
||||||
(?P<month>%s)
|
|
||||||
([,\s]\s*(?P<year>%s))?
|
|
||||||
)
|
|
||||||
''' % (re_ordinal, re_separator, month_names, re_year),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
int(HASHMONTHS[m.group('month').lower()] if m.group('month') else 1),
|
|
||||||
int(m.group('ordinal_value') if m.group('ordinal_value') else 1),
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<month>%s)
|
|
||||||
\s+
|
|
||||||
(?P<ordinal_value>\d+)
|
|
||||||
(?P<ordinal>%s) # January 1st 2012
|
|
||||||
([,\s]\s*(?P<year>%s))?
|
|
||||||
)
|
|
||||||
''' % (month_names, re_ordinal, re_year),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
int(HASHMONTHS[m.group('month').lower()] if m.group('month') else 1),
|
|
||||||
int(m.group('ordinal_value') if m.group('ordinal_value') else 1),
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<time>%s) # this, next, following, previous, last
|
|
||||||
\s+
|
|
||||||
((?P<number>\d+|(%s[-\s]?)+)\s)?
|
|
||||||
(?P<dmy>%s) # year, day, week, month, night, minute, min
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
''' % (re_timeframe, numbers, re_dmy, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE),
|
|
||||||
),
|
|
||||||
lambda m, base_date: date_from_relative_week_year(
|
|
||||||
base_date,
|
|
||||||
m.group('time'),
|
|
||||||
m.group('dmy'),
|
|
||||||
m.group('number')
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<time>%s) # this, next, following, previous, last
|
|
||||||
\s+
|
|
||||||
(?P<dow>%s) # mon - fri
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
''' % (re_timeframe, day_names, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE),
|
|
||||||
),
|
|
||||||
lambda m, base_date: date_from_relative_day(
|
|
||||||
base_date,
|
|
||||||
m.group('time'),
|
|
||||||
m.group('dow')
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<day>\d{1,2}) # Day, Month
|
|
||||||
(%s)
|
|
||||||
[-\s] # One or more space
|
|
||||||
(?P<month>%s)
|
|
||||||
)
|
|
||||||
''' % (re_ordinal, month_names),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
base_date.year,
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
int(m.group('day') if m.group('day') else 1)
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<month>%s) # Month, day
|
|
||||||
[-\s] # One or more space
|
|
||||||
((?P<day>\d{1,2})\b) # Matches a digit January 12
|
|
||||||
(%s)?
|
|
||||||
)
|
|
||||||
''' % (month_names, re_ordinal),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
base_date.year,
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
int(m.group('day') if m.group('day') else 1)
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<month>%s) # Month, year
|
|
||||||
[-\s] # One or more space
|
|
||||||
((?P<year>\d{1,4})\b) # Matches a digit January 12
|
|
||||||
)
|
|
||||||
''' % (month_names),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year')),
|
|
||||||
HASHMONTHS[m.group('month').strip().lower()],
|
|
||||||
1
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<month>\d{1,2}) # MM/DD or MM/DD/YYYY
|
|
||||||
/
|
|
||||||
((?P<day>\d{1,2}))
|
|
||||||
(/(?P<year>%s))?
|
|
||||||
)
|
|
||||||
''' % (re_year),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
int(m.group('year') if m.group('year') else base_date.year),
|
|
||||||
int(m.group('month').strip()),
|
|
||||||
int(m.group('day'))
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<adverb>%s) # today, yesterday, tomorrow, tonight
|
|
||||||
((\s|,\s|\s(%s))?\s*(%s))?
|
|
||||||
''' % (day_nearest_names, re_separator, re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: date_from_adverb(
|
|
||||||
base_date,
|
|
||||||
m.group('adverb')
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<named_day>%s) # Mon - Sun
|
|
||||||
''' % (day_names),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: this_week_day(
|
|
||||||
base_date,
|
|
||||||
HASHWEEKDAYS[m.group('named_day').lower()]
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<year>%s) # Year
|
|
||||||
''' % (re_year),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(int(m.group('year')), 1, 1)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(?P<month>%s) # Month
|
|
||||||
''' % (month_names_long),
|
|
||||||
(re.VERBOSE | re.IGNORECASE)
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
base_date.year,
|
|
||||||
HASHMONTHS[m.group('month').lower()],
|
|
||||||
1
|
|
||||||
)
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(%s) # Matches time 12:00
|
|
||||||
''' % (re_time),
|
|
||||||
(re.VERBOSE | re.IGNORECASE),
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
base_date.year,
|
|
||||||
base_date.month,
|
|
||||||
base_date.day
|
|
||||||
) + timedelta(**convert_time_to_hour_minute(
|
|
||||||
m.group('hour'),
|
|
||||||
m.group('minute'),
|
|
||||||
m.group('convention')
|
|
||||||
))
|
|
||||||
),
|
|
||||||
(
|
|
||||||
re.compile(
|
|
||||||
r'''
|
|
||||||
(
|
|
||||||
(?P<hour>\d+) # Matches 12 hours, 2 hrs
|
|
||||||
\s+
|
|
||||||
(%s)
|
|
||||||
)
|
|
||||||
''' % ('|'.join(hour_variations)),
|
|
||||||
(re.VERBOSE | re.IGNORECASE),
|
|
||||||
),
|
|
||||||
lambda m, base_date: datetime(
|
|
||||||
base_date.year,
|
|
||||||
base_date.month,
|
|
||||||
base_date.day,
|
|
||||||
int(m.group('hour'))
|
|
||||||
)
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def hashnum(number):
|
|
||||||
"""
|
|
||||||
Hash of numbers
|
|
||||||
Append more number to modify your match
|
|
||||||
"""
|
|
||||||
if re.match(r'one|^a\b', number, re.IGNORECASE):
|
|
||||||
return 1
|
|
||||||
if re.match(r'two', number, re.IGNORECASE):
|
|
||||||
return 2
|
|
||||||
if re.match(r'three', number, re.IGNORECASE):
|
|
||||||
return 3
|
|
||||||
if re.match(r'four', number, re.IGNORECASE):
|
|
||||||
return 4
|
|
||||||
if re.match(r'five', number, re.IGNORECASE):
|
|
||||||
return 5
|
|
||||||
if re.match(r'six', number, re.IGNORECASE):
|
|
||||||
return 6
|
|
||||||
if re.match(r'seven', number, re.IGNORECASE):
|
|
||||||
return 7
|
|
||||||
if re.match(r'eight', number, re.IGNORECASE):
|
|
||||||
return 8
|
|
||||||
if re.match(r'nine', number, re.IGNORECASE):
|
|
||||||
return 9
|
|
||||||
if re.match(r'ten', number, re.IGNORECASE):
|
|
||||||
return 10
|
|
||||||
if re.match(r'eleven', number, re.IGNORECASE):
|
|
||||||
return 11
|
|
||||||
if re.match(r'twelve', number, re.IGNORECASE):
|
|
||||||
return 12
|
|
||||||
if re.match(r'thirteen', number, re.IGNORECASE):
|
|
||||||
return 13
|
|
||||||
if re.match(r'fourteen', number, re.IGNORECASE):
|
|
||||||
return 14
|
|
||||||
if re.match(r'fifteen', number, re.IGNORECASE):
|
|
||||||
return 15
|
|
||||||
if re.match(r'sixteen', number, re.IGNORECASE):
|
|
||||||
return 16
|
|
||||||
if re.match(r'seventeen', number, re.IGNORECASE):
|
|
||||||
return 17
|
|
||||||
if re.match(r'eighteen', number, re.IGNORECASE):
|
|
||||||
return 18
|
|
||||||
if re.match(r'nineteen', number, re.IGNORECASE):
|
|
||||||
return 19
|
|
||||||
if re.match(r'twenty', number, re.IGNORECASE):
|
|
||||||
return 20
|
|
||||||
if re.match(r'thirty', number, re.IGNORECASE):
|
|
||||||
return 30
|
|
||||||
if re.match(r'forty', number, re.IGNORECASE):
|
|
||||||
return 40
|
|
||||||
if re.match(r'fifty', number, re.IGNORECASE):
|
|
||||||
return 50
|
|
||||||
if re.match(r'sixty', number, re.IGNORECASE):
|
|
||||||
return 60
|
|
||||||
if re.match(r'seventy', number, re.IGNORECASE):
|
|
||||||
return 70
|
|
||||||
if re.match(r'eighty', number, re.IGNORECASE):
|
|
||||||
return 80
|
|
||||||
if re.match(r'ninety', number, re.IGNORECASE):
|
|
||||||
return 90
|
|
||||||
if re.match(r'hundred', number, re.IGNORECASE):
|
|
||||||
return 100
|
|
||||||
if re.match(r'thousand', number, re.IGNORECASE):
|
|
||||||
return 1000
|
|
||||||
|
|
||||||
|
|
||||||
def convert_string_to_number(value):
|
|
||||||
"""
|
|
||||||
Convert strings to numbers
|
|
||||||
"""
|
|
||||||
if value is None:
|
|
||||||
return 1
|
|
||||||
if isinstance(value, int):
|
|
||||||
return value
|
|
||||||
if value.isdigit():
|
|
||||||
return int(value)
|
|
||||||
num_list = map(lambda s: hashnum(s), re.findall(numbers + '+', value, re.IGNORECASE))
|
|
||||||
return sum(num_list)
|
|
||||||
|
|
||||||
|
|
||||||
def convert_time_to_hour_minute(hour, minute, convention):
|
|
||||||
"""
|
|
||||||
Convert time to hour, minute
|
|
||||||
"""
|
|
||||||
if hour is None:
|
|
||||||
hour = 0
|
|
||||||
if minute is None:
|
|
||||||
minute = 0
|
|
||||||
if convention is None:
|
|
||||||
convention = 'am'
|
|
||||||
|
|
||||||
hour = int(hour)
|
|
||||||
minute = int(minute)
|
|
||||||
|
|
||||||
if convention == 'pm':
|
|
||||||
hour += 12
|
|
||||||
|
|
||||||
return {'hours': hour, 'minutes': minute}
|
|
||||||
|
|
||||||
|
|
||||||
def date_from_quarter(base_date, ordinal, year):
|
|
||||||
"""
|
|
||||||
Extract date from quarter of a year
|
|
||||||
"""
|
|
||||||
interval = 3
|
|
||||||
month_start = interval * (ordinal - 1)
|
|
||||||
if month_start < 0:
|
|
||||||
month_start = 9
|
|
||||||
month_end = month_start + interval
|
|
||||||
if month_start == 0:
|
|
||||||
month_start = 1
|
|
||||||
return [
|
|
||||||
datetime(year, month_start, 1),
|
|
||||||
datetime(year, month_end, calendar.monthrange(year, month_end)[1])
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def date_from_relative_day(base_date, time, dow):
|
|
||||||
"""
|
|
||||||
Converts relative day to time
|
|
||||||
Ex: this tuesday, last tuesday
|
|
||||||
"""
|
|
||||||
# Reset date to start of the day
|
|
||||||
base_date = datetime(base_date.year, base_date.month, base_date.day)
|
|
||||||
time = time.lower()
|
|
||||||
dow = dow.lower()
|
|
||||||
if time == 'this' or time == 'coming':
|
|
||||||
# Else day of week
|
|
||||||
num = HASHWEEKDAYS[dow]
|
|
||||||
return this_week_day(base_date, num)
|
|
||||||
elif time == 'last' or time == 'previous':
|
|
||||||
# Else day of week
|
|
||||||
num = HASHWEEKDAYS[dow]
|
|
||||||
return previous_week_day(base_date, num)
|
|
||||||
elif time == 'next' or time == 'following':
|
|
||||||
# Else day of week
|
|
||||||
num = HASHWEEKDAYS[dow]
|
|
||||||
return next_week_day(base_date, num)
|
|
||||||
|
|
||||||
|
|
||||||
def date_from_relative_week_year(base_date, time, dow, ordinal=1):
|
|
||||||
"""
|
|
||||||
Converts relative day to time
|
|
||||||
Eg. this tuesday, last tuesday
|
|
||||||
"""
|
|
||||||
# If there is an ordinal (next 3 weeks) => return a start and end range
|
|
||||||
# Reset date to start of the day
|
|
||||||
relative_date = datetime(base_date.year, base_date.month, base_date.day)
|
|
||||||
if dow in year_variations:
|
|
||||||
if time == 'this' or time == 'coming':
|
|
||||||
return datetime(relative_date.year, 1, 1)
|
|
||||||
elif time == 'last' or time == 'previous':
|
|
||||||
return datetime(relative_date.year - 1, relative_date.month, 1)
|
|
||||||
elif time == 'next' or time == 'following':
|
|
||||||
return relative_date + timedelta(relative_date.year + 1)
|
|
||||||
elif time == 'end of the':
|
|
||||||
return datetime(relative_date.year, 12, 31)
|
|
||||||
elif dow in month_variations:
|
|
||||||
if time == 'this':
|
|
||||||
return datetime(relative_date.year, relative_date.month, relative_date.day)
|
|
||||||
elif time == 'last' or time == 'previous':
|
|
||||||
return datetime(relative_date.year, relative_date.month - 1, relative_date.day)
|
|
||||||
elif time == 'next' or time == 'following':
|
|
||||||
return datetime(relative_date.year, relative_date.month + 1, relative_date.day)
|
|
||||||
elif time == 'end of the':
|
|
||||||
return datetime(
|
|
||||||
relative_date.year,
|
|
||||||
relative_date.month,
|
|
||||||
calendar.monthrange(relative_date.year, relative_date.month)[1]
|
|
||||||
)
|
|
||||||
elif dow in week_variations:
|
|
||||||
if time == 'this':
|
|
||||||
return relative_date - timedelta(days=relative_date.weekday())
|
|
||||||
elif time == 'last' or time == 'previous':
|
|
||||||
return relative_date - timedelta(weeks=1)
|
|
||||||
elif time == 'next' or time == 'following':
|
|
||||||
return relative_date + timedelta(weeks=1)
|
|
||||||
elif time == 'end of the':
|
|
||||||
day_of_week = base_date.weekday()
|
|
||||||
return day_of_week + timedelta(days=6 - relative_date.weekday())
|
|
||||||
elif dow in day_variations:
|
|
||||||
if time == 'this':
|
|
||||||
return relative_date
|
|
||||||
elif time == 'last' or time == 'previous':
|
|
||||||
return relative_date - timedelta(days=1)
|
|
||||||
elif time == 'next' or time == 'following':
|
|
||||||
return relative_date + timedelta(days=1)
|
|
||||||
elif time == 'end of the':
|
|
||||||
return datetime(relative_date.year, relative_date.month, relative_date.day, 23, 59, 59)
|
|
||||||
|
|
||||||
|
|
||||||
def date_from_adverb(base_date, name):
|
|
||||||
"""
|
|
||||||
Convert Day adverbs to dates
|
|
||||||
Tomorrow => Date
|
|
||||||
Today => Date
|
|
||||||
"""
|
|
||||||
# Reset date to start of the day
|
|
||||||
adverb_date = datetime(base_date.year, base_date.month, base_date.day)
|
|
||||||
if name == 'today' or name == 'tonite' or name == 'tonight':
|
|
||||||
return adverb_date.today()
|
|
||||||
elif name == 'yesterday':
|
|
||||||
return adverb_date - timedelta(days=1)
|
|
||||||
elif name == 'tomorrow' or name == 'tom':
|
|
||||||
return adverb_date + timedelta(days=1)
|
|
||||||
|
|
||||||
|
|
||||||
def date_from_duration(base_date, number_as_string, unit, duration, base_time=None):
|
|
||||||
"""
|
|
||||||
Find dates from duration
|
|
||||||
Eg: 20 days from now
|
|
||||||
Currently does not support strings like "20 days from last monday".
|
|
||||||
"""
|
|
||||||
# Check if query is `2 days before yesterday` or `day before yesterday`
|
|
||||||
if base_time is not None:
|
|
||||||
base_date = date_from_adverb(base_date, base_time)
|
|
||||||
num = convert_string_to_number(number_as_string)
|
|
||||||
args = {}
|
|
||||||
if unit in day_variations:
|
|
||||||
args = {'days': num}
|
|
||||||
elif unit in minute_variations:
|
|
||||||
args = {'minutes': num}
|
|
||||||
elif unit in week_variations:
|
|
||||||
args = {'weeks': num}
|
|
||||||
elif unit in month_variations:
|
|
||||||
args = {'days': 365 * num / 12}
|
|
||||||
elif unit in year_variations:
|
|
||||||
args = {'years': num}
|
|
||||||
if duration == 'ago' or duration == 'before' or duration == 'earlier':
|
|
||||||
if 'years' in args:
|
|
||||||
return datetime(base_date.year - args['years'], base_date.month, base_date.day)
|
|
||||||
return base_date - timedelta(**args)
|
|
||||||
elif duration == 'after' or duration == 'later' or duration == 'from now':
|
|
||||||
if 'years' in args:
|
|
||||||
return datetime(base_date.year + args['years'], base_date.month, base_date.day)
|
|
||||||
return base_date + timedelta(**args)
|
|
||||||
|
|
||||||
|
|
||||||
def this_week_day(base_date, weekday):
|
|
||||||
"""
|
|
||||||
Finds coming weekday
|
|
||||||
"""
|
|
||||||
day_of_week = base_date.weekday()
|
|
||||||
# If today is Tuesday and the query is `this monday`
|
|
||||||
# We should output the next_week monday
|
|
||||||
if day_of_week > weekday:
|
|
||||||
return next_week_day(base_date, weekday)
|
|
||||||
start_of_this_week = base_date - timedelta(days=day_of_week + 1)
|
|
||||||
day = start_of_this_week + timedelta(days=1)
|
|
||||||
while day.weekday() != weekday:
|
|
||||||
day = day + timedelta(days=1)
|
|
||||||
return day
|
|
||||||
|
|
||||||
|
|
||||||
def previous_week_day(base_date, weekday):
|
|
||||||
"""
|
|
||||||
Finds previous weekday
|
|
||||||
"""
|
|
||||||
day = base_date - timedelta(days=1)
|
|
||||||
while day.weekday() != weekday:
|
|
||||||
day = day - timedelta(days=1)
|
|
||||||
return day
|
|
||||||
|
|
||||||
|
|
||||||
def next_week_day(base_date, weekday):
|
|
||||||
"""
|
|
||||||
Finds next weekday
|
|
||||||
"""
|
|
||||||
day_of_week = base_date.weekday()
|
|
||||||
end_of_this_week = base_date + timedelta(days=6 - day_of_week)
|
|
||||||
day = end_of_this_week + timedelta(days=1)
|
|
||||||
while day.weekday() != weekday:
|
|
||||||
day = day + timedelta(days=1)
|
|
||||||
return day
|
|
||||||
|
|
||||||
|
|
||||||
# Mapping of Month name and Value
|
|
||||||
HASHMONTHS = {
|
|
||||||
'january': 1,
|
|
||||||
'jan': 1,
|
|
||||||
'february': 2,
|
|
||||||
'feb': 2,
|
|
||||||
'march': 3,
|
|
||||||
'mar': 3,
|
|
||||||
'april': 4,
|
|
||||||
'apr': 4,
|
|
||||||
'may': 5,
|
|
||||||
'june': 6,
|
|
||||||
'jun': 6,
|
|
||||||
'july': 7,
|
|
||||||
'jul': 7,
|
|
||||||
'august': 8,
|
|
||||||
'aug': 8,
|
|
||||||
'september': 9,
|
|
||||||
'sep': 9,
|
|
||||||
'october': 10,
|
|
||||||
'oct': 10,
|
|
||||||
'november': 11,
|
|
||||||
'nov': 11,
|
|
||||||
'december': 12,
|
|
||||||
'dec': 12
|
|
||||||
}
|
|
||||||
|
|
||||||
# Days to number mapping
|
|
||||||
HASHWEEKDAYS = {
|
|
||||||
'monday': 0,
|
|
||||||
'mon': 0,
|
|
||||||
'tuesday': 1,
|
|
||||||
'tue': 1,
|
|
||||||
'wednesday': 2,
|
|
||||||
'wed': 2,
|
|
||||||
'thursday': 3,
|
|
||||||
'thu': 3,
|
|
||||||
'friday': 4,
|
|
||||||
'fri': 4,
|
|
||||||
'saturday': 5,
|
|
||||||
'sat': 5,
|
|
||||||
'sunday': 6,
|
|
||||||
'sun': 6
|
|
||||||
}
|
|
||||||
|
|
||||||
# Ordinal to number
|
|
||||||
HASHORDINALS = {
|
|
||||||
'first': 1,
|
|
||||||
'second': 2,
|
|
||||||
'third': 3,
|
|
||||||
'fourth': 4,
|
|
||||||
'forth': 4,
|
|
||||||
'last': -1
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def datetime_parsing(text, base_date=datetime.now()):
|
|
||||||
"""
|
|
||||||
Extract datetime objects from a string of text.
|
|
||||||
"""
|
|
||||||
matches = []
|
|
||||||
found_array = []
|
|
||||||
|
|
||||||
# Find the position in the string
|
|
||||||
for expression, function in regex:
|
|
||||||
for match in expression.finditer(text):
|
|
||||||
matches.append((match.group(), function(match, base_date), match.span()))
|
|
||||||
|
|
||||||
# Wrap the matched text with TAG element to prevent nested selections
|
|
||||||
for match, value, spans in matches:
|
|
||||||
subn = re.subn(
|
|
||||||
'(?!<TAG[^>]*?>)' + match + '(?![^<]*?</TAG>)', '<TAG>' + match + '</TAG>', text
|
|
||||||
)
|
|
||||||
text = subn[0]
|
|
||||||
is_substituted = subn[1]
|
|
||||||
if is_substituted != 0:
|
|
||||||
found_array.append((match, value, spans))
|
|
||||||
|
|
||||||
# To preserve order of the match, sort based on the start position
|
|
||||||
return sorted(found_array, key=lambda match: match and match[2][0])
|
|
@ -1,50 +0,0 @@
|
|||||||
# -*- coding: utf-8 -*-
|
|
||||||
"""
|
|
||||||
Statement pre-processors.
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
def clean_whitespace(chatbot, statement):
|
|
||||||
"""
|
|
||||||
Remove any consecutive whitespace characters from the statement text.
|
|
||||||
"""
|
|
||||||
import re
|
|
||||||
|
|
||||||
# Replace linebreaks and tabs with spaces
|
|
||||||
statement.text = statement.text.replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')
|
|
||||||
|
|
||||||
# Remove any leeding or trailing whitespace
|
|
||||||
statement.text = statement.text.strip()
|
|
||||||
|
|
||||||
# Remove consecutive spaces
|
|
||||||
statement.text = re.sub(' +', ' ', statement.text)
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
|
|
||||||
def unescape_html(chatbot, statement):
|
|
||||||
"""
|
|
||||||
Convert escaped html characters into unescaped html characters.
|
|
||||||
For example: "<b>" becomes "<b>".
|
|
||||||
"""
|
|
||||||
|
|
||||||
# Replace HTML escape characters
|
|
||||||
import html
|
|
||||||
|
|
||||||
statement.text = html.unescape(statement.text)
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
|
|
||||||
def convert_to_ascii(chatbot, statement):
|
|
||||||
"""
|
|
||||||
Converts unicode characters to ASCII character equivalents.
|
|
||||||
For example: "på fédéral" becomes "pa federal".
|
|
||||||
"""
|
|
||||||
import unicodedata
|
|
||||||
|
|
||||||
text = unicodedata.normalize('NFKD', statement.text)
|
|
||||||
text = text.encode('ascii', 'ignore').decode('utf-8')
|
|
||||||
|
|
||||||
statement.text = str(text)
|
|
||||||
return statement
|
|
@ -1,71 +0,0 @@
|
|||||||
"""
|
|
||||||
Response selection methods determines which response should be used in
|
|
||||||
the event that multiple responses are generated within a logic adapter.
|
|
||||||
"""
|
|
||||||
import logging
|
|
||||||
|
|
||||||
|
|
||||||
def get_most_frequent_response(input_statement, response_list):
|
|
||||||
"""
|
|
||||||
:param input_statement: A statement, that closely matches an input to the chat bot.
|
|
||||||
:type input_statement: Statement
|
|
||||||
|
|
||||||
:param response_list: A list of statement options to choose a response from.
|
|
||||||
:type response_list: list
|
|
||||||
|
|
||||||
:return: The response statement with the greatest number of occurrences.
|
|
||||||
:rtype: Statement
|
|
||||||
"""
|
|
||||||
matching_response = None
|
|
||||||
occurrence_count = -1
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
logger.info(u'Selecting response with greatest number of occurrences.')
|
|
||||||
|
|
||||||
for statement in response_list:
|
|
||||||
count = statement.get_response_count(input_statement)
|
|
||||||
|
|
||||||
# Keep the more common statement
|
|
||||||
if count >= occurrence_count:
|
|
||||||
matching_response = statement
|
|
||||||
occurrence_count = count
|
|
||||||
|
|
||||||
# Choose the most commonly occuring matching response
|
|
||||||
return matching_response
|
|
||||||
|
|
||||||
|
|
||||||
def get_first_response(input_statement, response_list):
|
|
||||||
"""
|
|
||||||
:param input_statement: A statement, that closely matches an input to the chat bot.
|
|
||||||
:type input_statement: Statement
|
|
||||||
|
|
||||||
:param response_list: A list of statement options to choose a response from.
|
|
||||||
:type response_list: list
|
|
||||||
|
|
||||||
:return: Return the first statement in the response list.
|
|
||||||
:rtype: Statement
|
|
||||||
"""
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
logger.info(u'Selecting first response from list of {} options.'.format(
|
|
||||||
len(response_list)
|
|
||||||
))
|
|
||||||
return response_list[0]
|
|
||||||
|
|
||||||
|
|
||||||
def get_random_response(input_statement, response_list):
|
|
||||||
"""
|
|
||||||
:param input_statement: A statement, that closely matches an input to the chat bot.
|
|
||||||
:type input_statement: Statement
|
|
||||||
|
|
||||||
:param response_list: A list of statement options to choose a response from.
|
|
||||||
:type response_list: list
|
|
||||||
|
|
||||||
:return: Choose a random response from the selection.
|
|
||||||
:rtype: Statement
|
|
||||||
"""
|
|
||||||
from random import choice
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
logger.info(u'Selecting a response from list of {} options.'.format(
|
|
||||||
len(response_list)
|
|
||||||
))
|
|
||||||
return choice(response_list)
|
|
@ -1,9 +0,0 @@
|
|||||||
from .storage_adapter import StorageAdapter
|
|
||||||
from .mongodb import MongoDatabaseAdapter
|
|
||||||
from .sql_storage import SQLStorageAdapter
|
|
||||||
|
|
||||||
__all__ = (
|
|
||||||
'StorageAdapter',
|
|
||||||
'MongoDatabaseAdapter',
|
|
||||||
'SQLStorageAdapter',
|
|
||||||
)
|
|
@ -1,397 +0,0 @@
|
|||||||
from . import StorageAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class Query(object):
|
|
||||||
|
|
||||||
def __init__(self, query=None):
|
|
||||||
if query is None:
|
|
||||||
self.query = {}
|
|
||||||
else:
|
|
||||||
self.query = query
|
|
||||||
|
|
||||||
def value(self):
|
|
||||||
return self.query.copy()
|
|
||||||
|
|
||||||
def raw(self, data):
|
|
||||||
query = self.query.copy()
|
|
||||||
|
|
||||||
query.update(data)
|
|
||||||
|
|
||||||
return Query(query)
|
|
||||||
|
|
||||||
def statement_text_equals(self, statement_text):
|
|
||||||
query = self.query.copy()
|
|
||||||
|
|
||||||
query['text'] = statement_text
|
|
||||||
|
|
||||||
return Query(query)
|
|
||||||
|
|
||||||
def statement_text_not_in(self, statements):
|
|
||||||
query = self.query.copy()
|
|
||||||
|
|
||||||
if 'text' not in query:
|
|
||||||
query['text'] = {}
|
|
||||||
|
|
||||||
if '$nin' not in query['text']:
|
|
||||||
query['text']['$nin'] = []
|
|
||||||
|
|
||||||
query['text']['$nin'].extend(statements)
|
|
||||||
|
|
||||||
return Query(query)
|
|
||||||
|
|
||||||
def statement_response_list_contains(self, statement_text):
|
|
||||||
query = self.query.copy()
|
|
||||||
|
|
||||||
if 'in_response_to' not in query:
|
|
||||||
query['in_response_to'] = {}
|
|
||||||
|
|
||||||
if '$elemMatch' not in query['in_response_to']:
|
|
||||||
query['in_response_to']['$elemMatch'] = {}
|
|
||||||
|
|
||||||
query['in_response_to']['$elemMatch']['text'] = statement_text
|
|
||||||
|
|
||||||
return Query(query)
|
|
||||||
|
|
||||||
def statement_response_list_equals(self, response_list):
|
|
||||||
query = self.query.copy()
|
|
||||||
|
|
||||||
query['in_response_to'] = response_list
|
|
||||||
|
|
||||||
return Query(query)
|
|
||||||
|
|
||||||
|
|
||||||
class MongoDatabaseAdapter(StorageAdapter):
|
|
||||||
"""
|
|
||||||
The MongoDatabaseAdapter is an interface that allows
|
|
||||||
ChatterBot to store statements in a MongoDB database.
|
|
||||||
|
|
||||||
:keyword database: The name of the database you wish to connect to.
|
|
||||||
:type database: str
|
|
||||||
|
|
||||||
.. code-block:: python
|
|
||||||
|
|
||||||
database='chatterbot-database'
|
|
||||||
|
|
||||||
:keyword database_uri: The URI of a remote instance of MongoDB.
|
|
||||||
:type database_uri: str
|
|
||||||
|
|
||||||
.. code-block:: python
|
|
||||||
|
|
||||||
database_uri='mongodb://example.com:8100/'
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(MongoDatabaseAdapter, self).__init__(**kwargs)
|
|
||||||
from pymongo import MongoClient
|
|
||||||
from pymongo.errors import OperationFailure
|
|
||||||
|
|
||||||
self.database_name = self.kwargs.get(
|
|
||||||
'database', 'chatterbot-database'
|
|
||||||
)
|
|
||||||
self.database_uri = self.kwargs.get(
|
|
||||||
'database_uri', 'mongodb://localhost:27017/'
|
|
||||||
)
|
|
||||||
|
|
||||||
# Use the default host and port
|
|
||||||
self.client = MongoClient(self.database_uri)
|
|
||||||
|
|
||||||
# Increase the sort buffer to 42M if possible
|
|
||||||
try:
|
|
||||||
self.client.admin.command({'setParameter': 1, 'internalQueryExecMaxBlockingSortBytes': 44040192})
|
|
||||||
except OperationFailure:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Specify the name of the database
|
|
||||||
self.database = self.client[self.database_name]
|
|
||||||
|
|
||||||
# The mongo collection of statement documents
|
|
||||||
self.statements = self.database['statements']
|
|
||||||
|
|
||||||
# The mongo collection of conversation documents
|
|
||||||
self.conversations = self.database['conversations']
|
|
||||||
|
|
||||||
# Set a requirement for the text attribute to be unique
|
|
||||||
self.statements.create_index('text', unique=True)
|
|
||||||
|
|
||||||
self.base_query = Query()
|
|
||||||
|
|
||||||
def get_statement_model(self):
|
|
||||||
"""
|
|
||||||
Return the class for the statement model.
|
|
||||||
"""
|
|
||||||
from ..conversation import Statement
|
|
||||||
|
|
||||||
# Create a storage-aware statement
|
|
||||||
statement = Statement
|
|
||||||
statement.storage = self
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
def get_response_model(self):
|
|
||||||
"""
|
|
||||||
Return the class for the response model.
|
|
||||||
"""
|
|
||||||
from ..conversation import Response
|
|
||||||
|
|
||||||
# Create a storage-aware response
|
|
||||||
response = Response
|
|
||||||
response.storage = self
|
|
||||||
|
|
||||||
return response
|
|
||||||
|
|
||||||
def count(self):
|
|
||||||
return self.statements.count()
|
|
||||||
|
|
||||||
def find(self, statement_text):
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
query = self.base_query.statement_text_equals(statement_text)
|
|
||||||
|
|
||||||
values = self.statements.find_one(query.value())
|
|
||||||
|
|
||||||
if not values:
|
|
||||||
return None
|
|
||||||
|
|
||||||
del values['text']
|
|
||||||
|
|
||||||
# Build the objects for the response list
|
|
||||||
values['in_response_to'] = self.deserialize_responses(
|
|
||||||
values.get('in_response_to', [])
|
|
||||||
)
|
|
||||||
|
|
||||||
return Statement(statement_text, **values)
|
|
||||||
|
|
||||||
def deserialize_responses(self, response_list):
|
|
||||||
"""
|
|
||||||
Takes the list of response items and returns
|
|
||||||
the list converted to Response objects.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
Response = self.get_model('response')
|
|
||||||
proxy_statement = Statement('')
|
|
||||||
|
|
||||||
for response in response_list:
|
|
||||||
text = response['text']
|
|
||||||
del response['text']
|
|
||||||
|
|
||||||
proxy_statement.add_response(
|
|
||||||
Response(text, **response)
|
|
||||||
)
|
|
||||||
|
|
||||||
return proxy_statement.in_response_to
|
|
||||||
|
|
||||||
def mongo_to_object(self, statement_data):
|
|
||||||
"""
|
|
||||||
Return Statement object when given data
|
|
||||||
returned from Mongo DB.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
statement_text = statement_data['text']
|
|
||||||
del statement_data['text']
|
|
||||||
|
|
||||||
statement_data['in_response_to'] = self.deserialize_responses(
|
|
||||||
statement_data.get('in_response_to', [])
|
|
||||||
)
|
|
||||||
|
|
||||||
return Statement(statement_text, **statement_data)
|
|
||||||
|
|
||||||
def filter(self, **kwargs):
|
|
||||||
"""
|
|
||||||
Returns a list of statements in the database
|
|
||||||
that match the parameters specified.
|
|
||||||
"""
|
|
||||||
import pymongo
|
|
||||||
|
|
||||||
query = self.base_query
|
|
||||||
|
|
||||||
order_by = kwargs.pop('order_by', None)
|
|
||||||
|
|
||||||
# Convert Response objects to data
|
|
||||||
if 'in_response_to' in kwargs:
|
|
||||||
serialized_responses = []
|
|
||||||
for response in kwargs['in_response_to']:
|
|
||||||
serialized_responses.append({'text': response})
|
|
||||||
|
|
||||||
query = query.statement_response_list_equals(serialized_responses)
|
|
||||||
del kwargs['in_response_to']
|
|
||||||
|
|
||||||
if 'in_response_to__contains' in kwargs:
|
|
||||||
query = query.statement_response_list_contains(
|
|
||||||
kwargs['in_response_to__contains']
|
|
||||||
)
|
|
||||||
del kwargs['in_response_to__contains']
|
|
||||||
|
|
||||||
query = query.raw(kwargs)
|
|
||||||
|
|
||||||
matches = self.statements.find(query.value())
|
|
||||||
|
|
||||||
if order_by:
|
|
||||||
|
|
||||||
direction = pymongo.ASCENDING
|
|
||||||
|
|
||||||
# Sort so that newer datetimes appear first
|
|
||||||
if order_by == 'created_at':
|
|
||||||
direction = pymongo.DESCENDING
|
|
||||||
|
|
||||||
matches = matches.sort(order_by, direction)
|
|
||||||
|
|
||||||
results = []
|
|
||||||
|
|
||||||
for match in list(matches):
|
|
||||||
results.append(self.mongo_to_object(match))
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
def update(self, statement):
|
|
||||||
from pymongo import UpdateOne
|
|
||||||
from pymongo.errors import BulkWriteError
|
|
||||||
|
|
||||||
data = statement.serialize()
|
|
||||||
|
|
||||||
operations = []
|
|
||||||
|
|
||||||
update_operation = UpdateOne(
|
|
||||||
{'text': statement.text},
|
|
||||||
{'$set': data},
|
|
||||||
upsert=True
|
|
||||||
)
|
|
||||||
operations.append(update_operation)
|
|
||||||
|
|
||||||
# Make sure that an entry for each response is saved
|
|
||||||
for response_dict in data.get('in_response_to', []):
|
|
||||||
response_text = response_dict.get('text')
|
|
||||||
|
|
||||||
# $setOnInsert does nothing if the document is not created
|
|
||||||
update_operation = UpdateOne(
|
|
||||||
{'text': response_text},
|
|
||||||
{'$set': response_dict},
|
|
||||||
upsert=True
|
|
||||||
)
|
|
||||||
operations.append(update_operation)
|
|
||||||
|
|
||||||
try:
|
|
||||||
self.statements.bulk_write(operations, ordered=False)
|
|
||||||
except BulkWriteError as bwe:
|
|
||||||
# Log the details of a bulk write error
|
|
||||||
self.logger.error(str(bwe.details))
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
def create_conversation(self):
|
|
||||||
"""
|
|
||||||
Create a new conversation.
|
|
||||||
"""
|
|
||||||
conversation_id = self.conversations.insert_one({}).inserted_id
|
|
||||||
return conversation_id
|
|
||||||
|
|
||||||
def get_latest_response(self, conversation_id):
|
|
||||||
"""
|
|
||||||
Returns the latest response in a conversation if it exists.
|
|
||||||
Returns None if a matching conversation cannot be found.
|
|
||||||
"""
|
|
||||||
from pymongo import DESCENDING
|
|
||||||
|
|
||||||
statements = list(self.statements.find({
|
|
||||||
'conversations.id': conversation_id
|
|
||||||
}).sort('conversations.created_at', DESCENDING))
|
|
||||||
|
|
||||||
if not statements:
|
|
||||||
return None
|
|
||||||
|
|
||||||
return self.mongo_to_object(statements[-2])
|
|
||||||
|
|
||||||
def add_to_conversation(self, conversation_id, statement, response):
|
|
||||||
"""
|
|
||||||
Add the statement and response to the conversation.
|
|
||||||
"""
|
|
||||||
from datetime import datetime, timedelta
|
|
||||||
self.statements.update_one(
|
|
||||||
{
|
|
||||||
'text': statement.text
|
|
||||||
},
|
|
||||||
{
|
|
||||||
'$push': {
|
|
||||||
'conversations': {
|
|
||||||
'id': conversation_id,
|
|
||||||
'created_at': datetime.utcnow()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
)
|
|
||||||
self.statements.update_one(
|
|
||||||
{
|
|
||||||
'text': response.text
|
|
||||||
},
|
|
||||||
{
|
|
||||||
'$push': {
|
|
||||||
'conversations': {
|
|
||||||
'id': conversation_id,
|
|
||||||
# Force the response to be at least one millisecond after the input statement
|
|
||||||
'created_at': datetime.utcnow() + timedelta(milliseconds=1)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_random(self):
|
|
||||||
"""
|
|
||||||
Returns a random statement from the database
|
|
||||||
"""
|
|
||||||
from random import randint
|
|
||||||
|
|
||||||
count = self.count()
|
|
||||||
|
|
||||||
if count < 1:
|
|
||||||
raise self.EmptyDatabaseException()
|
|
||||||
|
|
||||||
random_integer = randint(0, count - 1)
|
|
||||||
|
|
||||||
statements = self.statements.find().limit(1).skip(random_integer)
|
|
||||||
|
|
||||||
return self.mongo_to_object(list(statements)[0])
|
|
||||||
|
|
||||||
def remove(self, statement_text):
|
|
||||||
"""
|
|
||||||
Removes the statement that matches the input text.
|
|
||||||
Removes any responses from statements if the response text matches the
|
|
||||||
input text.
|
|
||||||
"""
|
|
||||||
for statement in self.filter(in_response_to__contains=statement_text):
|
|
||||||
statement.remove_response(statement_text)
|
|
||||||
self.update(statement)
|
|
||||||
|
|
||||||
self.statements.delete_one({'text': statement_text})
|
|
||||||
|
|
||||||
def get_response_statements(self):
|
|
||||||
"""
|
|
||||||
Return only statements that are in response to another statement.
|
|
||||||
A statement must exist which lists the closest matching statement in the
|
|
||||||
in_response_to field. Otherwise, the logic adapter may find a closest
|
|
||||||
matching statement that does not have a known response.
|
|
||||||
"""
|
|
||||||
response_query = self.statements.aggregate([{'$group': {'_id': '$in_response_to.text'}}])
|
|
||||||
|
|
||||||
responses = []
|
|
||||||
for r in response_query:
|
|
||||||
try:
|
|
||||||
responses.extend(r['_id'])
|
|
||||||
except TypeError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
_statement_query = {
|
|
||||||
'text': {
|
|
||||||
'$in': responses
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
_statement_query.update(self.base_query.value())
|
|
||||||
statement_query = self.statements.find(_statement_query)
|
|
||||||
statement_objects = []
|
|
||||||
for statement in list(statement_query):
|
|
||||||
statement_objects.append(self.mongo_to_object(statement))
|
|
||||||
return statement_objects
|
|
||||||
|
|
||||||
def drop(self):
|
|
||||||
"""
|
|
||||||
Remove the database.
|
|
||||||
"""
|
|
||||||
self.client.drop_database(self.database_name)
|
|
@ -1,403 +0,0 @@
|
|||||||
from . import StorageAdapter
|
|
||||||
|
|
||||||
|
|
||||||
def get_response_table(response):
|
|
||||||
from ..ext.sqlalchemy_app.models import Response
|
|
||||||
return Response(text=response.text, occurrence=response.occurrence)
|
|
||||||
|
|
||||||
|
|
||||||
class SQLStorageAdapter(StorageAdapter):
|
|
||||||
"""
|
|
||||||
SQLStorageAdapter allows ChatterBot to store conversation
|
|
||||||
data semi-structured T-SQL database, virtually, any database
|
|
||||||
that SQL Alchemy supports.
|
|
||||||
|
|
||||||
Notes:
|
|
||||||
Tables may change (and will), so, save your training data.
|
|
||||||
There is no data migration (yet).
|
|
||||||
Performance test not done yet.
|
|
||||||
Tests using other databases not finished.
|
|
||||||
|
|
||||||
All parameters are optional, by default a sqlite database is used.
|
|
||||||
|
|
||||||
It will check if tables are present, if they are not, it will attempt
|
|
||||||
to create the required tables.
|
|
||||||
|
|
||||||
:keyword database: Used for sqlite database. Ignored if database_uri is specified.
|
|
||||||
:type database: str
|
|
||||||
|
|
||||||
:keyword database_uri: eg: sqlite:///database_test.db", use database_uri or database,
|
|
||||||
database_uri can be specified to choose database driver (database parameter will be ignored).
|
|
||||||
:type database_uri: str
|
|
||||||
|
|
||||||
:keyword read_only: False by default, makes all operations read only, has priority over all DB operations
|
|
||||||
so, create, update, delete will NOT be executed
|
|
||||||
:type read_only: bool
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, **kwargs):
|
|
||||||
super(SQLStorageAdapter, self).__init__(**kwargs)
|
|
||||||
|
|
||||||
from sqlalchemy import create_engine
|
|
||||||
from sqlalchemy.orm import sessionmaker
|
|
||||||
|
|
||||||
default_uri = "sqlite:///db.sqlite3"
|
|
||||||
|
|
||||||
database_name = self.kwargs.get("database", False)
|
|
||||||
|
|
||||||
# None results in a sqlite in-memory database as the default
|
|
||||||
if database_name is None:
|
|
||||||
default_uri = "sqlite://"
|
|
||||||
|
|
||||||
self.database_uri = self.kwargs.get(
|
|
||||||
"database_uri", default_uri
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create a sqlite file if a database name is provided
|
|
||||||
if database_name:
|
|
||||||
self.database_uri = "sqlite:///" + database_name
|
|
||||||
|
|
||||||
self.engine = create_engine(self.database_uri, convert_unicode=True)
|
|
||||||
|
|
||||||
from re import search
|
|
||||||
|
|
||||||
if search('^sqlite://', self.database_uri):
|
|
||||||
from sqlalchemy.engine import Engine
|
|
||||||
from sqlalchemy import event
|
|
||||||
|
|
||||||
@event.listens_for(Engine, "connect")
|
|
||||||
def set_sqlite_pragma(dbapi_connection, connection_record):
|
|
||||||
dbapi_connection.execute('PRAGMA journal_mode=WAL')
|
|
||||||
dbapi_connection.execute('PRAGMA synchronous=NORMAL')
|
|
||||||
|
|
||||||
self.read_only = self.kwargs.get(
|
|
||||||
"read_only", False
|
|
||||||
)
|
|
||||||
|
|
||||||
if not self.engine.dialect.has_table(self.engine, 'Statement'):
|
|
||||||
self.create()
|
|
||||||
|
|
||||||
self.Session = sessionmaker(bind=self.engine, expire_on_commit=True)
|
|
||||||
|
|
||||||
# ChatterBot's internal query builder is not yet supported for this adapter
|
|
||||||
self.adapter_supports_queries = False
|
|
||||||
|
|
||||||
def get_statement_model(self):
|
|
||||||
"""
|
|
||||||
Return the statement model.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Statement
|
|
||||||
return Statement
|
|
||||||
|
|
||||||
def get_response_model(self):
|
|
||||||
"""
|
|
||||||
Return the response model.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Response
|
|
||||||
return Response
|
|
||||||
|
|
||||||
def get_conversation_model(self):
|
|
||||||
"""
|
|
||||||
Return the conversation model.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Conversation
|
|
||||||
return Conversation
|
|
||||||
|
|
||||||
def get_tag_model(self):
|
|
||||||
"""
|
|
||||||
Return the conversation model.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Tag
|
|
||||||
return Tag
|
|
||||||
|
|
||||||
def count(self):
|
|
||||||
"""
|
|
||||||
Return the number of entries in the database.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
statement_count = session.query(Statement).count()
|
|
||||||
session.close()
|
|
||||||
return statement_count
|
|
||||||
|
|
||||||
def find(self, statement_text):
|
|
||||||
"""
|
|
||||||
Returns a statement if it exists otherwise None
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
session = self.Session()
|
|
||||||
|
|
||||||
query = session.query(Statement).filter_by(text=statement_text)
|
|
||||||
record = query.first()
|
|
||||||
if record:
|
|
||||||
statement = record.get_statement()
|
|
||||||
session.close()
|
|
||||||
return statement
|
|
||||||
|
|
||||||
session.close()
|
|
||||||
return None
|
|
||||||
|
|
||||||
def remove(self, statement_text):
|
|
||||||
"""
|
|
||||||
Removes the statement that matches the input text.
|
|
||||||
Removes any responses from statements where the response text matches
|
|
||||||
the input text.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
session = self.Session()
|
|
||||||
|
|
||||||
query = session.query(Statement).filter_by(text=statement_text)
|
|
||||||
record = query.first()
|
|
||||||
|
|
||||||
session.delete(record)
|
|
||||||
|
|
||||||
self._session_finish(session)
|
|
||||||
|
|
||||||
def filter(self, **kwargs):
|
|
||||||
"""
|
|
||||||
Returns a list of objects from the database.
|
|
||||||
The kwargs parameter can contain any number
|
|
||||||
of attributes. Only objects which contain
|
|
||||||
all listed attributes and in which all values
|
|
||||||
match for all listed attributes will be returned.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
Response = self.get_model('response')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
|
|
||||||
filter_parameters = kwargs.copy()
|
|
||||||
|
|
||||||
statements = []
|
|
||||||
_query = None
|
|
||||||
|
|
||||||
if len(filter_parameters) == 0:
|
|
||||||
_response_query = session.query(Statement)
|
|
||||||
statements.extend(_response_query.all())
|
|
||||||
else:
|
|
||||||
for i, fp in enumerate(filter_parameters):
|
|
||||||
_filter = filter_parameters[fp]
|
|
||||||
if fp in ['in_response_to', 'in_response_to__contains']:
|
|
||||||
_response_query = session.query(Statement)
|
|
||||||
if isinstance(_filter, list):
|
|
||||||
if len(_filter) == 0:
|
|
||||||
_query = _response_query.filter(
|
|
||||||
Statement.in_response_to is None # NOQA Here must use == instead of is
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
for f in _filter:
|
|
||||||
_query = _response_query.filter(
|
|
||||||
Statement.in_response_to.contains(get_response_table(f)))
|
|
||||||
else:
|
|
||||||
if fp == 'in_response_to__contains':
|
|
||||||
_query = _response_query.join(Response).filter(Response.text == _filter)
|
|
||||||
else:
|
|
||||||
_query = _response_query.filter(Statement.in_response_to is None) # NOQA
|
|
||||||
else:
|
|
||||||
if _query:
|
|
||||||
_query = _query.filter(Response.statement_text.like('%' + _filter + '%'))
|
|
||||||
else:
|
|
||||||
_response_query = session.query(Response)
|
|
||||||
_query = _response_query.filter(Response.statement_text.like('%' + _filter + '%'))
|
|
||||||
|
|
||||||
if _query is None:
|
|
||||||
return []
|
|
||||||
if len(filter_parameters) == i + 1:
|
|
||||||
statements.extend(_query.all())
|
|
||||||
|
|
||||||
results = []
|
|
||||||
|
|
||||||
for statement in statements:
|
|
||||||
if isinstance(statement, Response):
|
|
||||||
if statement and statement.statement_table:
|
|
||||||
results.append(statement.statement_table.get_statement())
|
|
||||||
else:
|
|
||||||
if statement:
|
|
||||||
results.append(statement.get_statement())
|
|
||||||
|
|
||||||
session.close()
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
def update(self, statement):
|
|
||||||
"""
|
|
||||||
Modifies an entry in the database.
|
|
||||||
Creates an entry if one does not exist.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
Response = self.get_model('response')
|
|
||||||
Tag = self.get_model('tag')
|
|
||||||
|
|
||||||
if statement:
|
|
||||||
session = self.Session()
|
|
||||||
|
|
||||||
query = session.query(Statement).filter_by(text=statement.text)
|
|
||||||
record = query.first()
|
|
||||||
|
|
||||||
# Create a new statement entry if one does not already exist
|
|
||||||
if not record:
|
|
||||||
record = Statement(text=statement.text)
|
|
||||||
|
|
||||||
record.extra_data = dict(statement.extra_data)
|
|
||||||
|
|
||||||
for _tag in statement.tags:
|
|
||||||
tag = session.query(Tag).filter_by(name=_tag).first()
|
|
||||||
|
|
||||||
if not tag:
|
|
||||||
# Create the record
|
|
||||||
tag = Tag(name=_tag)
|
|
||||||
|
|
||||||
record.tags.append(tag)
|
|
||||||
|
|
||||||
# Get or create the response records as needed
|
|
||||||
for response in statement.in_response_to:
|
|
||||||
_response = session.query(Response).filter_by(
|
|
||||||
text=response.text,
|
|
||||||
statement_text=statement.text
|
|
||||||
).first()
|
|
||||||
|
|
||||||
if _response:
|
|
||||||
_response.occurrence += 1
|
|
||||||
else:
|
|
||||||
# Create the record
|
|
||||||
_response = Response(
|
|
||||||
text=response.text,
|
|
||||||
statement_text=statement.text,
|
|
||||||
occurrence=response.occurrence
|
|
||||||
)
|
|
||||||
|
|
||||||
record.in_response_to.append(_response)
|
|
||||||
|
|
||||||
session.add(record)
|
|
||||||
|
|
||||||
self._session_finish(session)
|
|
||||||
|
|
||||||
def create_conversation(self):
|
|
||||||
"""
|
|
||||||
Create a new conversation.
|
|
||||||
"""
|
|
||||||
Conversation = self.get_model('conversation')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
conversation = Conversation()
|
|
||||||
|
|
||||||
session.add(conversation)
|
|
||||||
session.flush()
|
|
||||||
|
|
||||||
session.refresh(conversation)
|
|
||||||
conversation_id = conversation.id
|
|
||||||
|
|
||||||
session.commit()
|
|
||||||
session.close()
|
|
||||||
|
|
||||||
return conversation_id
|
|
||||||
|
|
||||||
def add_to_conversation(self, conversation_id, statement, response):
|
|
||||||
"""
|
|
||||||
Add the statement and response to the conversation.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
Conversation = self.get_model('conversation')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
conversation = session.query(Conversation).get(conversation_id)
|
|
||||||
|
|
||||||
statement_query = session.query(Statement).filter_by(
|
|
||||||
text=statement.text
|
|
||||||
).first()
|
|
||||||
response_query = session.query(Statement).filter_by(
|
|
||||||
text=response.text
|
|
||||||
).first()
|
|
||||||
|
|
||||||
# Make sure the statements exist
|
|
||||||
if not statement_query:
|
|
||||||
self.update(statement)
|
|
||||||
statement_query = session.query(Statement).filter_by(
|
|
||||||
text=statement.text
|
|
||||||
).first()
|
|
||||||
|
|
||||||
if not response_query:
|
|
||||||
self.update(response)
|
|
||||||
response_query = session.query(Statement).filter_by(
|
|
||||||
text=response.text
|
|
||||||
).first()
|
|
||||||
|
|
||||||
conversation.statements.append(statement_query)
|
|
||||||
conversation.statements.append(response_query)
|
|
||||||
|
|
||||||
session.add(conversation)
|
|
||||||
self._session_finish(session)
|
|
||||||
|
|
||||||
def get_latest_response(self, conversation_id):
|
|
||||||
"""
|
|
||||||
Returns the latest response in a conversation if it exists.
|
|
||||||
Returns None if a matching conversation cannot be found.
|
|
||||||
"""
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
statement = None
|
|
||||||
|
|
||||||
statement_query = session.query(Statement).filter(
|
|
||||||
Statement.conversations.any(id=conversation_id)
|
|
||||||
).order_by(Statement.id)
|
|
||||||
|
|
||||||
if statement_query.count() >= 2:
|
|
||||||
statement = statement_query[-2].get_statement()
|
|
||||||
|
|
||||||
# Handle the case of the first statement in the list
|
|
||||||
elif statement_query.count() == 1:
|
|
||||||
statement = statement_query[0].get_statement()
|
|
||||||
|
|
||||||
session.close()
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
def get_random(self):
|
|
||||||
"""
|
|
||||||
Returns a random statement from the database
|
|
||||||
"""
|
|
||||||
import random
|
|
||||||
|
|
||||||
Statement = self.get_model('statement')
|
|
||||||
|
|
||||||
session = self.Session()
|
|
||||||
count = self.count()
|
|
||||||
if count < 1:
|
|
||||||
raise self.EmptyDatabaseException()
|
|
||||||
|
|
||||||
rand = random.randrange(0, count)
|
|
||||||
stmt = session.query(Statement)[rand]
|
|
||||||
|
|
||||||
statement = stmt.get_statement()
|
|
||||||
|
|
||||||
session.close()
|
|
||||||
return statement
|
|
||||||
|
|
||||||
def drop(self):
|
|
||||||
"""
|
|
||||||
Drop the database attached to a given adapter.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Base
|
|
||||||
Base.metadata.drop_all(self.engine)
|
|
||||||
|
|
||||||
def create(self):
|
|
||||||
"""
|
|
||||||
Populate the database with the tables.
|
|
||||||
"""
|
|
||||||
from ..ext.sqlalchemy_app.models import Base
|
|
||||||
Base.metadata.create_all(self.engine)
|
|
||||||
|
|
||||||
def _session_finish(self, session, statement_text=None):
|
|
||||||
from sqlalchemy.exc import InvalidRequestError
|
|
||||||
try:
|
|
||||||
if not self.read_only:
|
|
||||||
session.commit()
|
|
||||||
else:
|
|
||||||
session.rollback()
|
|
||||||
except InvalidRequestError:
|
|
||||||
# Log the statement text and the exception
|
|
||||||
self.logger.exception(statement_text)
|
|
||||||
finally:
|
|
||||||
session.close()
|
|
@ -1,174 +0,0 @@
|
|||||||
import logging
|
|
||||||
|
|
||||||
|
|
||||||
class StorageAdapter(object):
|
|
||||||
"""
|
|
||||||
This is an abstract class that represents the interface
|
|
||||||
that all storage adapters should implement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, base_query=None, *args, **kwargs):
|
|
||||||
"""
|
|
||||||
Initialize common attributes shared by all storage adapters.
|
|
||||||
"""
|
|
||||||
self.kwargs = kwargs
|
|
||||||
self.logger = kwargs.get('logger', logging.getLogger(__name__))
|
|
||||||
self.adapter_supports_queries = True
|
|
||||||
self.base_query = None
|
|
||||||
|
|
||||||
def get_model(self, model_name):
|
|
||||||
"""
|
|
||||||
Return the model class for a given model name.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# The string must be lowercase
|
|
||||||
model_name = model_name.lower()
|
|
||||||
|
|
||||||
kwarg_model_key = '%s_model' % (model_name,)
|
|
||||||
|
|
||||||
if kwarg_model_key in self.kwargs:
|
|
||||||
return self.kwargs.get(kwarg_model_key)
|
|
||||||
|
|
||||||
get_model_method = getattr(self, 'get_%s_model' % (model_name,))
|
|
||||||
|
|
||||||
return get_model_method()
|
|
||||||
|
|
||||||
def generate_base_query(self, chatterbot, session_id):
|
|
||||||
"""
|
|
||||||
Create a base query for the storage adapter.
|
|
||||||
"""
|
|
||||||
if self.adapter_supports_queries:
|
|
||||||
for filter_instance in chatterbot.filters:
|
|
||||||
self.base_query = filter_instance.filter_selection(chatterbot, session_id)
|
|
||||||
|
|
||||||
def count(self):
|
|
||||||
"""
|
|
||||||
Return the number of entries in the database.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `count` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def find(self, statement_text):
|
|
||||||
"""
|
|
||||||
Returns a object from the database if it exists
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `find` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def remove(self, statement_text):
|
|
||||||
"""
|
|
||||||
Removes the statement that matches the input text.
|
|
||||||
Removes any responses from statements where the response text matches
|
|
||||||
the input text.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `remove` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def filter(self, **kwargs):
|
|
||||||
"""
|
|
||||||
Returns a list of objects from the database.
|
|
||||||
The kwargs parameter can contain any number
|
|
||||||
of attributes. Only objects which contain
|
|
||||||
all listed attributes and in which all values
|
|
||||||
match for all listed attributes will be returned.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `filter` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def update(self, statement):
|
|
||||||
"""
|
|
||||||
Modifies an entry in the database.
|
|
||||||
Creates an entry if one does not exist.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `update` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_latest_response(self, conversation_id):
|
|
||||||
"""
|
|
||||||
Returns the latest response in a conversation if it exists.
|
|
||||||
Returns None if a matching conversation cannot be found.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `get_latest_response` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def create_conversation(self):
|
|
||||||
"""
|
|
||||||
Creates a new conversation.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `create_conversation` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def add_to_conversation(self, conversation_id, statement, response):
|
|
||||||
"""
|
|
||||||
Add the statement and response to the conversation.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `add_to_conversation` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_random(self):
|
|
||||||
"""
|
|
||||||
Returns a random statement from the database.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `get_random` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def drop(self):
|
|
||||||
"""
|
|
||||||
Drop the database attached to a given adapter.
|
|
||||||
"""
|
|
||||||
raise self.AdapterMethodNotImplementedError(
|
|
||||||
'The `drop` method is not implemented by this adapter.'
|
|
||||||
)
|
|
||||||
|
|
||||||
def get_response_statements(self):
|
|
||||||
"""
|
|
||||||
Return only statements that are in response to another statement.
|
|
||||||
A statement must exist which lists the closest matching statement in the
|
|
||||||
in_response_to field. Otherwise, the logic adapter may find a closest
|
|
||||||
matching statement that does not have a known response.
|
|
||||||
|
|
||||||
This method may be overridden by a child class to provide more a
|
|
||||||
efficient method to get these results.
|
|
||||||
"""
|
|
||||||
statement_list = self.filter()
|
|
||||||
|
|
||||||
responses = set()
|
|
||||||
to_remove = list()
|
|
||||||
for statement in statement_list:
|
|
||||||
for response in statement.in_response_to:
|
|
||||||
responses.add(response.text)
|
|
||||||
for statement in statement_list:
|
|
||||||
if statement.text not in responses:
|
|
||||||
to_remove.append(statement)
|
|
||||||
|
|
||||||
for statement in to_remove:
|
|
||||||
statement_list.remove(statement)
|
|
||||||
|
|
||||||
return statement_list
|
|
||||||
|
|
||||||
class EmptyDatabaseException(Exception):
|
|
||||||
|
|
||||||
def __init__(self,
|
|
||||||
value='The database currently contains no entries. '
|
|
||||||
'At least one entry is expected. '
|
|
||||||
'You may need to train your chat bot to populate your database.'):
|
|
||||||
self.value = value
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
||||||
|
|
||||||
class AdapterMethodNotImplementedError(NotImplementedError):
|
|
||||||
"""
|
|
||||||
An exception to be raised when a storage adapter method has not been implemented.
|
|
||||||
Typically this indicates that the method should be implement in a subclass.
|
|
||||||
"""
|
|
||||||
pass
|
|
@ -1,424 +0,0 @@
|
|||||||
import logging
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
|
|
||||||
from . import utils
|
|
||||||
from .conversation import Statement, Response
|
|
||||||
|
|
||||||
|
|
||||||
class Trainer(object):
|
|
||||||
"""
|
|
||||||
Base class for all other trainer classes.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, storage, **kwargs):
|
|
||||||
self.chatbot = kwargs.get('chatbot')
|
|
||||||
self.storage = storage
|
|
||||||
self.logger = logging.getLogger(__name__)
|
|
||||||
self.show_training_progress = kwargs.get('show_training_progress', True)
|
|
||||||
|
|
||||||
def get_preprocessed_statement(self, input_statement):
|
|
||||||
"""
|
|
||||||
Preprocess the input statement.
|
|
||||||
"""
|
|
||||||
|
|
||||||
# The chatbot is optional to prevent backwards-incompatible changes
|
|
||||||
if not self.chatbot:
|
|
||||||
return input_statement
|
|
||||||
|
|
||||||
for preprocessor in self.chatbot.preprocessors:
|
|
||||||
input_statement = preprocessor(self, input_statement)
|
|
||||||
|
|
||||||
return input_statement
|
|
||||||
|
|
||||||
def train(self, *args, **kwargs):
|
|
||||||
"""
|
|
||||||
This method must be overridden by a child class.
|
|
||||||
"""
|
|
||||||
raise self.TrainerInitializationException()
|
|
||||||
|
|
||||||
def get_or_create(self, statement_text):
|
|
||||||
"""
|
|
||||||
Return a statement if it exists.
|
|
||||||
Create and return the statement if it does not exist.
|
|
||||||
"""
|
|
||||||
temp_statement = self.get_preprocessed_statement(
|
|
||||||
Statement(text=statement_text)
|
|
||||||
)
|
|
||||||
|
|
||||||
statement = self.storage.find(temp_statement.text)
|
|
||||||
|
|
||||||
if not statement:
|
|
||||||
statement = Statement(temp_statement.text)
|
|
||||||
|
|
||||||
return statement
|
|
||||||
|
|
||||||
class TrainerInitializationException(Exception):
|
|
||||||
"""
|
|
||||||
Exception raised when a base class has not overridden
|
|
||||||
the required methods on the Trainer base class.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, value=None):
|
|
||||||
default = (
|
|
||||||
'A training class must be specified before calling train(). ' +
|
|
||||||
'See http://chatterbot.readthedocs.io/en/stable/training.html'
|
|
||||||
)
|
|
||||||
self.value = value or default
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
return repr(self.value)
|
|
||||||
|
|
||||||
def _generate_export_data(self):
|
|
||||||
result = []
|
|
||||||
for statement in self.storage.filter():
|
|
||||||
for response in statement.in_response_to:
|
|
||||||
result.append([response.text, statement.text])
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
def export_for_training(self, file_path='./export.json'):
|
|
||||||
"""
|
|
||||||
Create a file from the database that can be used to
|
|
||||||
train other chat bots.
|
|
||||||
"""
|
|
||||||
import json
|
|
||||||
export = {'conversations': self._generate_export_data()}
|
|
||||||
with open(file_path, 'w+') as jsonfile:
|
|
||||||
json.dump(export, jsonfile, ensure_ascii=True)
|
|
||||||
|
|
||||||
|
|
||||||
class ListTrainer(Trainer):
|
|
||||||
"""
|
|
||||||
Allows a chat bot to be trained using a list of strings
|
|
||||||
where the list represents a conversation.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def train(self, conversation):
|
|
||||||
"""
|
|
||||||
Train the chat bot based on the provided list of
|
|
||||||
statements that represents a single conversation.
|
|
||||||
"""
|
|
||||||
previous_statement_text = None
|
|
||||||
|
|
||||||
for conversation_count, text in enumerate(conversation):
|
|
||||||
if self.show_training_progress:
|
|
||||||
utils.print_progress_bar(
|
|
||||||
'List Trainer',
|
|
||||||
conversation_count + 1, len(conversation)
|
|
||||||
)
|
|
||||||
|
|
||||||
statement = self.get_or_create(text)
|
|
||||||
|
|
||||||
if previous_statement_text:
|
|
||||||
statement.add_response(
|
|
||||||
Response(previous_statement_text)
|
|
||||||
)
|
|
||||||
|
|
||||||
previous_statement_text = statement.text
|
|
||||||
self.storage.update(statement)
|
|
||||||
|
|
||||||
|
|
||||||
class ChatterBotCorpusTrainer(Trainer):
|
|
||||||
"""
|
|
||||||
Allows the chat bot to be trained using data from the
|
|
||||||
ChatterBot dialog corpus.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, storage, **kwargs):
|
|
||||||
super(ChatterBotCorpusTrainer, self).__init__(storage, **kwargs)
|
|
||||||
from .corpus import Corpus
|
|
||||||
|
|
||||||
self.corpus = Corpus()
|
|
||||||
|
|
||||||
def train(self, *corpus_paths):
|
|
||||||
|
|
||||||
# Allow a list of corpora to be passed instead of arguments
|
|
||||||
if len(corpus_paths) == 1:
|
|
||||||
if isinstance(corpus_paths[0], list):
|
|
||||||
corpus_paths = corpus_paths[0]
|
|
||||||
|
|
||||||
# Train the chat bot with each statement and response pair
|
|
||||||
for corpus_path in corpus_paths:
|
|
||||||
|
|
||||||
corpora = self.corpus.load_corpus(corpus_path)
|
|
||||||
|
|
||||||
corpus_files = self.corpus.list_corpus_files(corpus_path)
|
|
||||||
for corpus_count, corpus in enumerate(corpora):
|
|
||||||
for conversation_count, conversation in enumerate(corpus):
|
|
||||||
|
|
||||||
if self.show_training_progress:
|
|
||||||
utils.print_progress_bar(
|
|
||||||
str(os.path.basename(corpus_files[corpus_count])) + ' Training',
|
|
||||||
conversation_count + 1,
|
|
||||||
len(corpus)
|
|
||||||
)
|
|
||||||
|
|
||||||
previous_statement_text = None
|
|
||||||
|
|
||||||
for text in conversation:
|
|
||||||
statement = self.get_or_create(text)
|
|
||||||
statement.add_tags(corpus.categories)
|
|
||||||
|
|
||||||
if previous_statement_text:
|
|
||||||
statement.add_response(
|
|
||||||
Response(previous_statement_text)
|
|
||||||
)
|
|
||||||
|
|
||||||
previous_statement_text = statement.text
|
|
||||||
self.storage.update(statement)
|
|
||||||
|
|
||||||
|
|
||||||
class TwitterTrainer(Trainer):
|
|
||||||
"""
|
|
||||||
Allows the chat bot to be trained using data
|
|
||||||
gathered from Twitter.
|
|
||||||
|
|
||||||
:param random_seed_word: The seed word to be used to get random tweets from the Twitter API.
|
|
||||||
This parameter is optional. By default it is the word 'random'.
|
|
||||||
:param twitter_lang: Language for results as ISO 639-1 code.
|
|
||||||
This parameter is optional. Default is None (all languages).
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, storage, **kwargs):
|
|
||||||
super(TwitterTrainer, self).__init__(storage, **kwargs)
|
|
||||||
from twitter import Api as TwitterApi
|
|
||||||
|
|
||||||
# The word to be used as the first search term when searching for tweets
|
|
||||||
self.random_seed_word = kwargs.get('random_seed_word', 'random')
|
|
||||||
self.lang = kwargs.get('twitter_lang')
|
|
||||||
|
|
||||||
self.api = TwitterApi(
|
|
||||||
consumer_key=kwargs.get('twitter_consumer_key'),
|
|
||||||
consumer_secret=kwargs.get('twitter_consumer_secret'),
|
|
||||||
access_token_key=kwargs.get('twitter_access_token_key'),
|
|
||||||
access_token_secret=kwargs.get('twitter_access_token_secret')
|
|
||||||
)
|
|
||||||
|
|
||||||
def random_word(self, base_word, lang=None):
|
|
||||||
"""
|
|
||||||
Generate a random word using the Twitter API.
|
|
||||||
|
|
||||||
Search twitter for recent tweets containing the term 'random'.
|
|
||||||
Then randomly select one word from those tweets and do another
|
|
||||||
search with that word. Return a randomly selected word from the
|
|
||||||
new set of results.
|
|
||||||
"""
|
|
||||||
import random
|
|
||||||
random_tweets = self.api.GetSearch(term=base_word, count=5, lang=lang)
|
|
||||||
random_words = self.get_words_from_tweets(random_tweets)
|
|
||||||
random_word = random.choice(list(random_words))
|
|
||||||
tweets = self.api.GetSearch(term=random_word, count=5, lang=lang)
|
|
||||||
words = self.get_words_from_tweets(tweets)
|
|
||||||
word = random.choice(list(words))
|
|
||||||
return word
|
|
||||||
|
|
||||||
def get_words_from_tweets(self, tweets):
|
|
||||||
"""
|
|
||||||
Given a list of tweets, return the set of
|
|
||||||
words from the tweets.
|
|
||||||
"""
|
|
||||||
words = set()
|
|
||||||
|
|
||||||
for tweet in tweets:
|
|
||||||
tweet_words = tweet.text.split()
|
|
||||||
|
|
||||||
for word in tweet_words:
|
|
||||||
# If the word contains only letters with a length from 4 to 9
|
|
||||||
if word.isalpha() and 3 < len(word) <= 9:
|
|
||||||
words.add(word)
|
|
||||||
|
|
||||||
return words
|
|
||||||
|
|
||||||
def get_statements(self):
|
|
||||||
"""
|
|
||||||
Returns list of random statements from the API.
|
|
||||||
"""
|
|
||||||
from twitter import TwitterError
|
|
||||||
statements = []
|
|
||||||
|
|
||||||
# Generate a random word
|
|
||||||
random_word = self.random_word(self.random_seed_word, self.lang)
|
|
||||||
|
|
||||||
self.logger.info(u'Requesting 50 random tweets containing the word {}'.format(random_word))
|
|
||||||
tweets = self.api.GetSearch(term=random_word, count=50, lang=self.lang)
|
|
||||||
for tweet in tweets:
|
|
||||||
statement = Statement(tweet.text)
|
|
||||||
|
|
||||||
if tweet.in_reply_to_status_id:
|
|
||||||
try:
|
|
||||||
status = self.api.GetStatus(tweet.in_reply_to_status_id)
|
|
||||||
statement.add_response(Response(status.text))
|
|
||||||
statements.append(statement)
|
|
||||||
except TwitterError as error:
|
|
||||||
self.logger.warning(str(error))
|
|
||||||
|
|
||||||
self.logger.info('Adding {} tweets with responses'.format(len(statements)))
|
|
||||||
|
|
||||||
return statements
|
|
||||||
|
|
||||||
def train(self):
|
|
||||||
for _ in range(0, 10):
|
|
||||||
statements = self.get_statements()
|
|
||||||
for statement in statements:
|
|
||||||
self.storage.update(statement)
|
|
||||||
|
|
||||||
|
|
||||||
class UbuntuCorpusTrainer(Trainer):
|
|
||||||
"""
|
|
||||||
Allow chatbots to be trained with the data from
|
|
||||||
the Ubuntu Dialog Corpus.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, storage, **kwargs):
|
|
||||||
super(UbuntuCorpusTrainer, self).__init__(storage, **kwargs)
|
|
||||||
|
|
||||||
self.data_download_url = kwargs.get(
|
|
||||||
'ubuntu_corpus_data_download_url',
|
|
||||||
'http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ubuntu_dialogs.tgz'
|
|
||||||
)
|
|
||||||
|
|
||||||
self.data_directory = kwargs.get(
|
|
||||||
'ubuntu_corpus_data_directory',
|
|
||||||
'./data/'
|
|
||||||
)
|
|
||||||
|
|
||||||
self.extracted_data_directory = os.path.join(
|
|
||||||
self.data_directory, 'ubuntu_dialogs'
|
|
||||||
)
|
|
||||||
|
|
||||||
# Create the data directory if it does not already exist
|
|
||||||
if not os.path.exists(self.data_directory):
|
|
||||||
os.makedirs(self.data_directory)
|
|
||||||
|
|
||||||
def is_downloaded(self, file_path):
|
|
||||||
"""
|
|
||||||
Check if the data file is already downloaded.
|
|
||||||
"""
|
|
||||||
if os.path.exists(file_path):
|
|
||||||
self.logger.info('File is already downloaded')
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def is_extracted(self, file_path):
|
|
||||||
"""
|
|
||||||
Check if the data file is already extracted.
|
|
||||||
"""
|
|
||||||
|
|
||||||
if os.path.isdir(file_path):
|
|
||||||
self.logger.info('File is already extracted')
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
def download(self, url, show_status=True):
|
|
||||||
"""
|
|
||||||
Download a file from the given url.
|
|
||||||
Show a progress indicator for the download status.
|
|
||||||
Based on: http://stackoverflow.com/a/15645088/1547223
|
|
||||||
"""
|
|
||||||
import requests
|
|
||||||
|
|
||||||
file_name = url.split('/')[-1]
|
|
||||||
file_path = os.path.join(self.data_directory, file_name)
|
|
||||||
|
|
||||||
# Do not download the data if it already exists
|
|
||||||
if self.is_downloaded(file_path):
|
|
||||||
return file_path
|
|
||||||
|
|
||||||
with open(file_path, 'wb') as open_file:
|
|
||||||
print('Downloading %s' % url)
|
|
||||||
response = requests.get(url, stream=True)
|
|
||||||
total_length = response.headers.get('content-length')
|
|
||||||
|
|
||||||
if total_length is None:
|
|
||||||
# No content length header
|
|
||||||
open_file.write(response.content)
|
|
||||||
else:
|
|
||||||
download = 0
|
|
||||||
total_length = int(total_length)
|
|
||||||
for data in response.iter_content(chunk_size=4096):
|
|
||||||
download += len(data)
|
|
||||||
open_file.write(data)
|
|
||||||
if show_status:
|
|
||||||
done = int(50 * download / total_length)
|
|
||||||
sys.stdout.write('\r[%s%s]' % ('=' * done, ' ' * (50 - done)))
|
|
||||||
sys.stdout.flush()
|
|
||||||
|
|
||||||
# Add a new line after the download bar
|
|
||||||
sys.stdout.write('\n')
|
|
||||||
|
|
||||||
print('Download location: %s' % file_path)
|
|
||||||
return file_path
|
|
||||||
|
|
||||||
def extract(self, file_path):
|
|
||||||
"""
|
|
||||||
Extract a tar file at the specified file path.
|
|
||||||
"""
|
|
||||||
import tarfile
|
|
||||||
|
|
||||||
print('Extracting {}'.format(file_path))
|
|
||||||
|
|
||||||
if not os.path.exists(self.extracted_data_directory):
|
|
||||||
os.makedirs(self.extracted_data_directory)
|
|
||||||
|
|
||||||
def track_progress(members):
|
|
||||||
sys.stdout.write('.')
|
|
||||||
for member in members:
|
|
||||||
# This will be the current file being extracted
|
|
||||||
yield member
|
|
||||||
|
|
||||||
with tarfile.open(file_path) as tar:
|
|
||||||
tar.extractall(path=self.extracted_data_directory, members=track_progress(tar))
|
|
||||||
|
|
||||||
self.logger.info('File extracted to {}'.format(self.extracted_data_directory))
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
def train(self):
|
|
||||||
import glob
|
|
||||||
import csv
|
|
||||||
|
|
||||||
# Download and extract the Ubuntu dialog corpus if needed
|
|
||||||
corpus_download_path = self.download(self.data_download_url)
|
|
||||||
|
|
||||||
# Extract if the directory doesn not already exists
|
|
||||||
if not self.is_extracted(self.extracted_data_directory):
|
|
||||||
self.extract(corpus_download_path)
|
|
||||||
|
|
||||||
extracted_corpus_path = os.path.join(
|
|
||||||
self.extracted_data_directory,
|
|
||||||
'**', '**', '*.tsv'
|
|
||||||
)
|
|
||||||
|
|
||||||
# Specify the encoding in Python versions 3 and up
|
|
||||||
file_kwargs = {'encoding': 'utf-8'}
|
|
||||||
# WARNING: This might fail to read a unicode corpus file in Python 2.x
|
|
||||||
|
|
||||||
for file in glob.iglob(extracted_corpus_path):
|
|
||||||
self.logger.info('Training from: {}'.format(file))
|
|
||||||
|
|
||||||
with open(file, 'r', **file_kwargs) as tsv:
|
|
||||||
reader = csv.reader(tsv, delimiter='\t')
|
|
||||||
|
|
||||||
previous_statement_text = None
|
|
||||||
|
|
||||||
for row in reader:
|
|
||||||
if len(row) > 0:
|
|
||||||
text = row[3]
|
|
||||||
statement = self.get_or_create(text)
|
|
||||||
print(text, len(row))
|
|
||||||
|
|
||||||
statement.add_extra_data('datetime', row[0])
|
|
||||||
statement.add_extra_data('speaker', row[1])
|
|
||||||
|
|
||||||
if row[2].strip():
|
|
||||||
statement.add_extra_data('addressing_speaker', row[2])
|
|
||||||
|
|
||||||
if previous_statement_text:
|
|
||||||
statement.add_response(
|
|
||||||
Response(previous_statement_text)
|
|
||||||
)
|
|
||||||
|
|
||||||
previous_statement_text = statement.text
|
|
||||||
self.storage.update(statement)
|
|
@ -1,199 +0,0 @@
|
|||||||
"""
|
|
||||||
ChatterBot utility functions
|
|
||||||
"""
|
|
||||||
|
|
||||||
|
|
||||||
def import_module(dotted_path):
|
|
||||||
"""
|
|
||||||
Imports the specified module based on the
|
|
||||||
dot notated import path for the module.
|
|
||||||
"""
|
|
||||||
import importlib
|
|
||||||
|
|
||||||
module_parts = dotted_path.split('.')
|
|
||||||
if module_parts[:2] == ["chatter", "chatterbot"]:
|
|
||||||
# An import path starting with chatter.chatterbot means it comes from this
|
|
||||||
# package, and should be imported relatively.
|
|
||||||
package = __package__
|
|
||||||
module_parts = module_parts[2:]
|
|
||||||
module_parts[0] = "." + module_parts[0]
|
|
||||||
else:
|
|
||||||
package = None
|
|
||||||
module_path = '.'.join(module_parts[:-1])
|
|
||||||
module = importlib.import_module(module_path, package=package)
|
|
||||||
|
|
||||||
return getattr(module, module_parts[-1])
|
|
||||||
|
|
||||||
|
|
||||||
def initialize_class(data, **kwargs):
|
|
||||||
"""
|
|
||||||
:param data: A string or dictionary containing a import_path attribute.
|
|
||||||
"""
|
|
||||||
if isinstance(data, dict):
|
|
||||||
import_path = data.get('import_path')
|
|
||||||
data.update(kwargs)
|
|
||||||
Class = import_module(import_path)
|
|
||||||
|
|
||||||
return Class(**data)
|
|
||||||
else:
|
|
||||||
Class = import_module(data)
|
|
||||||
|
|
||||||
return Class(**kwargs)
|
|
||||||
|
|
||||||
|
|
||||||
def validate_adapter_class(validate_class, adapter_class):
|
|
||||||
"""
|
|
||||||
Raises an exception if validate_class is not a
|
|
||||||
subclass of adapter_class.
|
|
||||||
|
|
||||||
:param validate_class: The class to be validated.
|
|
||||||
:type validate_class: class
|
|
||||||
|
|
||||||
:param adapter_class: The class type to check against.
|
|
||||||
:type adapter_class: class
|
|
||||||
|
|
||||||
:raises: Adapter.InvalidAdapterTypeException
|
|
||||||
"""
|
|
||||||
from .adapters import Adapter
|
|
||||||
|
|
||||||
# If a dictionary was passed in, check if it has an import_path attribute
|
|
||||||
if isinstance(validate_class, dict):
|
|
||||||
|
|
||||||
if 'import_path' not in validate_class:
|
|
||||||
raise Adapter.InvalidAdapterTypeException(
|
|
||||||
'The dictionary {} must contain a value for "import_path"'.format(
|
|
||||||
str(validate_class)
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
# Set the class to the import path for the next check
|
|
||||||
validate_class = validate_class.get('import_path')
|
|
||||||
|
|
||||||
if not issubclass(import_module(validate_class), adapter_class):
|
|
||||||
raise Adapter.InvalidAdapterTypeException(
|
|
||||||
'{} must be a subclass of {}'.format(
|
|
||||||
validate_class,
|
|
||||||
adapter_class.__name__
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def input_function():
|
|
||||||
"""
|
|
||||||
Normalizes reading input between python 2 and 3.
|
|
||||||
The function 'raw_input' becomes 'input' in Python 3.
|
|
||||||
"""
|
|
||||||
|
|
||||||
user_input = input() # NOQA
|
|
||||||
|
|
||||||
return user_input
|
|
||||||
|
|
||||||
|
|
||||||
def nltk_download_corpus(resource_path):
|
|
||||||
"""
|
|
||||||
Download the specified NLTK corpus file
|
|
||||||
unless it has already been downloaded.
|
|
||||||
|
|
||||||
Returns True if the corpus needed to be downloaded.
|
|
||||||
"""
|
|
||||||
from nltk.data import find
|
|
||||||
from nltk import download
|
|
||||||
from os.path import split, sep
|
|
||||||
from zipfile import BadZipfile
|
|
||||||
|
|
||||||
# Download the NLTK data only if it is not already downloaded
|
|
||||||
_, corpus_name = split(resource_path)
|
|
||||||
|
|
||||||
# From http://www.nltk.org/api/nltk.html
|
|
||||||
# When using find() to locate a directory contained in a zipfile,
|
|
||||||
# the resource name must end with the forward slash character.
|
|
||||||
# Otherwise, find() will not locate the directory.
|
|
||||||
#
|
|
||||||
# Helps when resource_path=='sentiment/vader_lexicon''
|
|
||||||
if not resource_path.endswith(sep):
|
|
||||||
resource_path = resource_path + sep
|
|
||||||
|
|
||||||
downloaded = False
|
|
||||||
|
|
||||||
try:
|
|
||||||
find(resource_path)
|
|
||||||
except LookupError:
|
|
||||||
download(corpus_name)
|
|
||||||
downloaded = True
|
|
||||||
except BadZipfile:
|
|
||||||
raise BadZipfile(
|
|
||||||
'The NLTK corpus file being opened is not a zipfile, '
|
|
||||||
'or it has been corrupted and needs to be manually deleted.'
|
|
||||||
)
|
|
||||||
|
|
||||||
return downloaded
|
|
||||||
|
|
||||||
|
|
||||||
def remove_stopwords(tokens, language):
|
|
||||||
"""
|
|
||||||
Takes a language (i.e. 'english'), and a set of word tokens.
|
|
||||||
Returns the tokenized text with any stopwords removed.
|
|
||||||
Stop words are words like "is, the, a, ..."
|
|
||||||
|
|
||||||
Be sure to download the required NLTK corpus before calling this function:
|
|
||||||
- from chatterbot.utils import nltk_download_corpus
|
|
||||||
- nltk_download_corpus('corpora/stopwords')
|
|
||||||
"""
|
|
||||||
from nltk.corpus import stopwords
|
|
||||||
|
|
||||||
# Get the stopwords for the specified language
|
|
||||||
stop_words = stopwords.words(language)
|
|
||||||
|
|
||||||
# Remove the stop words from the set of word tokens
|
|
||||||
tokens = set(tokens) - set(stop_words)
|
|
||||||
|
|
||||||
return tokens
|
|
||||||
|
|
||||||
|
|
||||||
def get_response_time(chatbot):
|
|
||||||
"""
|
|
||||||
Returns the amount of time taken for a given
|
|
||||||
chat bot to return a response.
|
|
||||||
|
|
||||||
:param chatbot: A chat bot instance.
|
|
||||||
:type chatbot: ChatBot
|
|
||||||
|
|
||||||
:returns: The response time in seconds.
|
|
||||||
:rtype: float
|
|
||||||
"""
|
|
||||||
import time
|
|
||||||
|
|
||||||
start_time = time.time()
|
|
||||||
|
|
||||||
chatbot.get_response('Hello')
|
|
||||||
|
|
||||||
return time.time() - start_time
|
|
||||||
|
|
||||||
|
|
||||||
def print_progress_bar(description, iteration_counter, total_items, progress_bar_length=20):
|
|
||||||
"""
|
|
||||||
Print progress bar
|
|
||||||
:param description: Training description
|
|
||||||
:type description: str
|
|
||||||
|
|
||||||
:param iteration_counter: Incremental counter
|
|
||||||
:type iteration_counter: int
|
|
||||||
|
|
||||||
:param total_items: total number items
|
|
||||||
:type total_items: int
|
|
||||||
|
|
||||||
:param progress_bar_length: Progress bar length
|
|
||||||
:type progress_bar_length: int
|
|
||||||
|
|
||||||
:returns: void
|
|
||||||
:rtype: void
|
|
||||||
"""
|
|
||||||
import sys
|
|
||||||
|
|
||||||
percent = float(iteration_counter) / total_items
|
|
||||||
hashes = '#' * int(round(percent * progress_bar_length))
|
|
||||||
spaces = ' ' * (progress_bar_length - len(hashes))
|
|
||||||
sys.stdout.write("\r{0}: [{1}] {2}%".format(description, hashes + spaces, int(round(percent * 100))))
|
|
||||||
sys.stdout.flush()
|
|
||||||
if total_items == iteration_counter:
|
|
||||||
print("\r")
|
|
@ -0,0 +1,12 @@
|
|||||||
|
git+git://github.com/gunthercox/chatterbot-corpus@master#egg=chatterbot_corpus
|
||||||
|
mathparse>=0.1,<0.2
|
||||||
|
nltk>=3.2,<4.0
|
||||||
|
pint>=0.8.1
|
||||||
|
python-dateutil>=2.8,<2.9
|
||||||
|
pyyaml>=5.3,<5.4
|
||||||
|
sqlalchemy>=1.3,<1.4
|
||||||
|
pytz
|
||||||
|
spacy>=2.3,<2.4
|
||||||
|
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz#egg=en_core_web_sm
|
||||||
|
https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.3.1/en_core_web_md-2.3.1.tar.gz#egg=en_core_web_md
|
||||||
|
# https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz#egg=en_core_web_lg
|
Loading…
Reference in new issue