JokeR: seqgan for Joke generation (Tensorflow)

SeqGAN implementation for generating jokes using an RNN and LSTM.

Generating jokes is a challenging and understudied task of Natural Language Processing. A computer that intends and succeeds to generate jokes could be deemed artificially intelligent. We present a couple of novel approaches to joke generation, such as using SeqGAN and a language model. We implement a variety of word-level models to tackle parts of the joke-generation problem, namely text generation and joke classification. Ideally, merging these steps will allow for a model to write joke candidates, that are then pruned by a well-trained classifier. We train these models from a corpus of 231,657 userwritten jokes scraped from reddit.com

Samples

Some sample jokes that the model generated.

• I was in the bar, but I wasn’t the bartender.

• What do you call a bar, a bar because it was a bar

• What do you call a girlfriend? a child to the bar.

• I was in the bar and the bar says I am not a bar.

Usage

To train the default model to generate lines from sample_text.txt, use:

>>> ./train

To sample from a trained model, use:

>>> ./sample <sample_len>

To specify your own text, use:

>>> ./train -t /path/to/your/file.txt

To see all the training options, use:

>>> ./train --help

This gives

usage: train.py [-h] [-t TEXT] [-l SEQ_LEN] [-b BATCH_SIZE] [-n NUM_STEPS]
                [-e NUM_EPOCHS] [-c] [-p LEARN_PHASE]

Train a SeqGAN model on some text.

optional arguments:
  -h, --help            show this help message and exit
  -t TEXT, --text TEXT  path to the text to use
  -l SEQ_LEN, --seq_len SEQ_LEN
                        the length of each training sequence
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        size of each training batch
  -n NUM_STEPS, --num_steps NUM_STEPS
                        number of steps per epoch
  -e NUM_EPOCHS, --num_epochs NUM_EPOCHS
                        number of training epochs
  -c, --only_cpu        if set, only build weights on cpu
  -p LEARN_PHASE, --learn_phase LEARN_PHASE
                        learning phase (None for synchronized)