Seattle Pet License Data

In January of this year, Seattle released a list of active/current pet licenses in the city.

For each of the 66,042 pets in the list we have the following data:

  • Liscense Issue Date
  • License Number
  • Animal's Name
  • Species
  • Primary Breed
  • Secondary Breed
  • ZIP Code

The data.seattle.gov website actually makes it fairly easy to explore, filter, and plot the data. Given that the only quantitative data we have are the License Number and zip code, however, the plots you can make online are a bit limited. Luckily, the dataset is easily downloaded as a csv file.

Pet name word clouds sorted by breed

When I intially browsed the data online, I came across a Pug named Franklin Tucker. I thought this was a hilarious name for a Pug and it gave me the idea to create word clouds of dog names separated by breed. These word clouds would provide a fun way to look at the most common name for a given breed of dog. To make these word clouds, I used the word cloud package.

In [1]:
#Import the usual stuff
import numpy as np
from PIL import Image
from os import path
import matplotlib.pyplot as plt
import random
import pandas as pd
from wordcloud import WordCloud, STOPWORDS

Create a pandas DataFrame

The data are easily read into a dataframe given that we were able to download them as a csv file. I like using pandas for this sort of thing because it makes filtering the data on single or multiple columns extremely easy.

In [2]:
pet_data = pd.read_csv('../data/Seattle_Pet_Licenses.csv',index_col=None)

One quick way to split the dataframe into groups is using the pandas groupby function.

In [3]:
breed_groups = pet_data.groupby(['Primary Breed']).groups

breed_groups is now a dictionary whose keys are each of the Primary Breeds avaialbe in the original data, and each key corresponds to a list of index values into pet_data. Let's see how many different breeds are represented in the data.

In [4]:
print (len(breed_groups))
323

Let's also print each breed and the number of pets that match that breed. This highlights how we can use the results of the groupby function to slice the original data.

In [5]:
for breed in breed_groups.keys():
    print (breed,len(pet_data.iloc[breed_groups[breed]]))
Abyssinian 58
Affenpinscher 8
Afghan Hound 15
Akbash 3
Akita 74
Alapaha Blue Blood Bulldog 1
Alaskan Husky 52
Alaskan Klee Kai 16
Alaskan Malamute 128
American Bandogge Mastiff 3
American Blue Heeler 26
American Bobtail 4
American Curl 5
American Eskimo 142
American Indian Dog 2
American Shorthair 992
American Wirehair 2
Anatolian Shepherd 17
Angora 10
Appenzeller Sennenhunde 1
Argentine Dogo 9
Asian Shorthair 5
Australian Cattle Dog 728
Australian Kelpie 58
Australian Shepherd 1184
Australian Shepherd, Miniature 116
Azores Cattle Dog 1
Balinese 40
Barbet 1
Basenji 140
Basset Hound 181
Beagle 744
Bearded Collie 23
Beauceron 6
Belgian Malinois 61
Belgian Sheepdog 23
Belgian Tervuren 24
Bengal 98
Bernese Mountain Dog 228
Bichon Frise 360
Birman 27
Bloodhound 15
Boerboel 4
Bolognese 3
Bombay 45
Border Collie 991
Borzoi 9
Bouvier des Flandres 23
Boxer 635
Braque Francais Gascogne 1
Braque Francais Pyrenees 2
Briard 15
British Shorthair 52
Brittany 152
Brittany, French 1
Bulldog 75
Bulldog, American 112
Bulldog, English 209
Bulldog, French 463
Bullmastiff 39
Burmese 63
California Spangled 2
Canaan 9
Canadian Eskimo 2
Cane Corso 18
Carolina Dog 15
Catahoula Leopard dog 141
Caucasian Mountain Dog 1
Chartreux 8
Chihuahua, Long Coat 181
Chihuahua, Short Coat 2134
Chinese Crested 27
Chinese Domestic 2
Chinese Shar-Pei 91
Chinook 15
Chow Chow 186
Cirneco dell Etna 2
Collie, Rough 120
Collie, Smooth 36
Colourpoint Shorthair 4
Coonhound 61
Coonhound, American English 2
Coonhound, Black and Tan 7
Coonhound, Bluetick 4
Coonhound, Redbone 31
Coonhound, Treeing Walker 27
Cornish Rex 3
Coton de Tulear 70
Croatian Sheepdog 1
Cur, Black-Mouth 21
Cur, Mountain 5
Dachshund, Miniature Long Haired 51
Dachshund, Miniature Smooth Haired 350
Dachshund, Miniature Wire Haired 10
Dachshund, Standard Long Haired 88
Dachshund, Standard Smooth Haired 580
Dachshund, Standard Wire Haired 45
Dalmatian 75
Deerhound, Scottish 7
Devon Rex 27
Doberman Pinscher 177
Dogue de Bordeaux 10
Domestic Longhair 1916
Domestic Medium Hair 3020
Domestic Shorthair 13134
Dutch Sheepdog 1
Dutch Shepherd 10
Dutch Smoushond 1
Egyptian Mau 9
English Mastiff 16
English Shepherd 7
Entlebucher Mountain Dog 13
Estrela Mountain Dog 1
Eurasier 2
European Shorthair 10
Exotic Shorthair 20
Feist 3
Fila Brasileiro 1
Finnish Lapphund 3
Finnish Spitz 12
Foxhound, American 16
Foxhound, English 6
German Pinscher 7
German Shepherd 1259
German Shepherd, King 3
German Spitz 3
Great Dane 167
Great Pyrenees 109
Greater Swiss Mountain Dog 33
Greyhound 207
Griffon Vendeen, Grand Basset 1
Griffon Vendeen, Petit Basset 4
Griffon, Brussels 51
Griffon, Wirehaired Pointing 19
Groenendael 1
Haldenstoever 1
Harrier 7
Havanese 566
Himalayan 73
Hound 203
Ibizan Hound 7
Icelandic Sheepdog 6
Irish Wolfhound 16
Italian Greyhound 117
Japanese Bobtail 5
Japanese Chin 46
Japanese Spitz 1
Kai Ken 6
Kangal 1
Karelian Bear Dog 12
Keeshond 43
Kooikerhondje 6
Korean Jindo 31
Kuvasz 7
LaPerm 752
Lacy 2
Lagotto Romagnolo 10
Landseer 5
Leonberger 25
Lhasa Apso 231
Lowchen 10
Maine Coon 394
Maltese 459
Manx 111
Maremma Sheepdog 1
Mastiff 107
McNab Herding Dog 3
Mexican Hairless 4
Miniature 44
Miniature Pinscher 280
Mix 602
Mixed Breed, Large (over 44 lbs fully grown) 8
Mixed Breed, Medium (up to 44 lbs fully grown) 6
Munchkin 2
Munsterlander, Large  3
Munsterlander,Small 1
Neapolitan Mastiff 4
Newfoundland 123
Norwegian Buhund 2
Norwegian Elkhound 39
Norwegian Forest 66
Norwegian Lundehund 3
Ocicat 10
Old English Sheepdog 51
Olde English Bulldogge 15
Oriental Shorthair 12
Otterhound 2
Papillon 152
Pekingese 120
Perdiguero Gallego 1
Perro de Presa Canario 2
Persian 125
Peruvian Inca Orchid 2
Pharaoh Hound 7
Pixie-Bob 53
Plott Hound 39
Podenco Canario 1
Pointer 124
Pointer, German Shorthaired 214
Pointer, German Wirehaired 31
Pomeranian 600
Poodle, Miniature 929
Poodle, Standard 490
Poodle, Toy 224
Portuguese Podengo 3
Portuguese Water Dog 163
Pot Bellied 1
Pot-Bellied 4
Pudelpointer 1
Pug 819
Puli 10
Pyrenean Shepherd 2
RagaMuffin 5
Ragdoll 187
Retriever 120
Retriever, Chesapeake Bay 90
Retriever, Curly Coated 12
Retriever, Flat-Coated 119
Retriever, Golden 2321
Retriever, Labrador 6350
Retriever, Nova Scotia Duck Tolling 39
Rhodesian Ridgeback 204
Rottweiler 266
Royal Bahamian Potcake 1
Russian Blue 171
Saint Bernard 60
Saluki 11
Samoyed 69
Savannah 1
Schipperke 64
Schnauzer, Giant 19
Schnauzer, Miniature 627
Schnauzer, Standard 74
Scottish Fold 24
Setter, English 58
Setter, Gordon 19
Setter, Irish 38
Shepherd 519
Shetland Sheepdog 179
Shiba Inu 356
Shih Tzu 828
Shiloh Shepherd 2
Siamese 977
Siberian 77
Siberian Husky 522
Silken Windhound 5
Singapura 6
Snowshoe 63
Somali 10
Spaniel 119
Spaniel, American Cocker 529
Spaniel, American Water 6
Spaniel, Boykin 1
Spaniel, Cavalier King Charles 455
Spaniel, Clumber 8
Spaniel, English Cocker 71
Spaniel, English Springer 171
Spaniel, English Toy 2
Spaniel, Field 7
Spaniel, Irish Water 14
Spaniel, Sussex 1
Spaniel, Tibetan 26
Spaniel, Welsh Springer 91
Spanish Water Dog 1
Sphynx 8
Spinone Italiano 19
Stabyhoun 1
Standard 2
Swedish Vallhund 3
Terrier 1119
Terrier, Airedale 89
Terrier, American Hairless 1
Terrier, American Pit Bull 973
Terrier, American Staffordshire 202
Terrier, Australian 36
Terrier, Bedlington 4
Terrier, Black Russian 2
Terrier, Border 104
Terrier, Boston 543
Terrier, Bull 35
Terrier, Cairn 223
Terrier, Dandie Dinmont 5
Terrier, English Staffordshire 2
Terrier, Fox, Smooth 42
Terrier, Fox, Toy 31
Terrier, Fox, Wire 56
Terrier, Irish 36
Terrier, Jack Russell 565
Terrier, Kerry Blue 13
Terrier, Lakeland 18
Terrier, Manchester 30
Terrier, Miniature Bull 3
Terrier, Norfolk 22
Terrier, Norwich 37
Terrier, Parson Russell 8
Terrier, Pit Bull 154
Terrier, Rat 309
Terrier, Russell 12
Terrier, Scottish 107
Terrier, Sealyham 4
Terrier, Silky 48
Terrier, Skye 1
Terrier, Soft Coated Wheaten 196
Terrier, Staffordshire Bull 99
Terrier, Tibetan 62
Terrier, Welsh 39
Terrier, West Highland White 249
Terrier, Yorkshire 690
Thai Ridgeback 2
Tibetan Mastiff 11
Tiffany 4
Tonkinese 13
Toyger 4
Turkish Angora 5
Turkish Van 17
Vizsla, Smooth Haired 197
Vizsla, Wire Haired 1
Weimaraner 140
Welsh Corgi, Cardigan 270
Welsh Corgi, Pembroke 372
Whippet 117
White Swiss Shepherd (Berger Blanc Suisse) 3
Xoloitzcuintli 795

Now we can easily grab all of the data for a certain breed of pet. As an example let's grab the golden retrievers.

In [6]:
golden_df = pet_data.iloc[breed_groups['Retriever, Golden']]
golden_df.head()
Out[6]:
License Issue Date License Number Animal's Name Species Primary Breed Secondary Breed ZIP Code
2 January 20 2006 29654 Ginger Dog Retriever, Golden Retriever, Labrador 98117
4 August 04 2006 729899 Addy Dog Retriever, Golden NaN 98105
23 March 18 2010 132643 Nugget Dog Retriever, Golden NaN 98104
44 December 29 2012 25863 Spencer Dog Retriever, Golden Poodle, Standard 98107
65 May 29 2013 S111409 Romeo Dog Retriever, Golden NaN 98144

You could also use groupby to group by Primary Breed and Zip Code. This would give you a quick way to look up how prevalent a breed in a given area.

In [7]:
breed_groups = pet_data.groupby(['Primary Breed', 'ZIP Code']).groups

As an example lets look at all of the Standard Smooth Haired Dachshunds that live in the 98115 ZIP code.

In [8]:
greenlake_sausages = pet_data.iloc[breed_groups['Dachshund, Standard Smooth Haired','98115']]
greenlake_sausages
Out[8]:
License Issue Date License Number Animal's Name Species Primary Breed Secondary Breed ZIP Code
1239 December 18 2014 946173 Deiter Dog Dachshund, Standard Smooth Haired NaN 98115
2382 December 26 2014 903321 Chai Dog Dachshund, Standard Smooth Haired Terrier 98115
4454 March 17 2015 132145 Cooper Dog Dachshund, Standard Smooth Haired NaN 98115
8806 April 17 2015 212877 Rosetta Dog Dachshund, Standard Smooth Haired NaN 98115
9453 June 22 2015 360228 Jax Dog Dachshund, Standard Smooth Haired Chihuahua, Short Coat 98115
10963 January 13 2015 959613 Winnie Dog Dachshund, Standard Smooth Haired NaN 98115
11725 January 13 2015 29840 Foxy Dog Dachshund, Standard Smooth Haired NaN 98115
12497 February 19 2015 85453 Ivy Dog Dachshund, Standard Smooth Haired Mix 98115
13229 April 10 2015 142889 Caramel Dog Dachshund, Standard Smooth Haired NaN 98115
13372 April 21 2015 84783 Widget Dog Dachshund, Standard Smooth Haired NaN 98115
20339 December 12 2015 828848 Rusty Dog Dachshund, Standard Smooth Haired NaN 98115
20497 December 08 2015 958028 Winston Dog Dachshund, Standard Smooth Haired NaN 98115
21341 August 06 2015 582297 Tigerlily Dog Dachshund, Standard Smooth Haired NaN 98115
25183 November 10 2015 904678 Jax Dog Dachshund, Standard Smooth Haired NaN 98115
25944 July 20 2015 29480 Tina Dog Dachshund, Standard Smooth Haired Terrier 98115
28107 August 27 2015 S103067 Penny Dog Dachshund, Standard Smooth Haired Italian Greyhound 98115
28273 September 04 2015 S103211 Ziva Dog Dachshund, Standard Smooth Haired Beagle 98115
31708 December 08 2015 S108631 Ruby Dog Dachshund, Standard Smooth Haired Chihuahua, Short Coat 98115
32906 February 13 2016 9765 BeBop Dog Dachshund, Standard Smooth Haired NaN 98115
36001 February 16 2016 80352 Yoshi Dog Dachshund, Standard Smooth Haired NaN 98115
36630 February 06 2016 26015 Harley Dog Dachshund, Standard Smooth Haired NaN 98115
36878 February 13 2016 441122 Amexia Dog Dachshund, Standard Smooth Haired Retriever, Labrador 98115
37913 March 26 2016 582491 Hamlet Dog Dachshund, Standard Smooth Haired NaN 98115
38142 April 21 2016 213998 Dozer Dog Dachshund, Standard Smooth Haired Mix 98115
38254 May 10 2016 281732 AnnaBelle Dog Dachshund, Standard Smooth Haired Chihuahua, Short Coat 98115
41734 January 29 2016 28744 Louis Jadot Dog Dachshund, Standard Smooth Haired NaN 98115
43950 February 23 2016 S111956 Bentley Halterman Dog Dachshund, Standard Smooth Haired NaN 98115
48320 November 15 2016 950236 Lola Dog Dachshund, Standard Smooth Haired NaN 98115
48457 October 25 2016 127047 Trixie Dog Dachshund, Standard Smooth Haired NaN 98115
50211 July 07 2016 282935 Zach Dog Dachshund, Standard Smooth Haired Terrier, Yorkshire 98115
52508 October 17 2016 727134 Bella Dog Dachshund, Standard Smooth Haired NaN 98115
55295 August 23 2016 586751 Ty Dog Dachshund, Standard Smooth Haired Mix 98115
58141 September 02 2016 S118897 Tweak Dog Dachshund, Standard Smooth Haired NaN 98115
59892 August 25 2016 S120369 Molly Dog Dachshund, Standard Smooth Haired Chihuahua, Short Coat 98115
63152 December 24 2016 946173 Deiter Dog Dachshund, Standard Smooth Haired NaN 98115
63379 December 14 2016 827371 Fanny Dog Dachshund, Standard Smooth Haired NaN 98115
63986 November 23 2016 903321 Chai Dog Dachshund, Standard Smooth Haired Terrier 98115
65968 December 28 2016 S125251 Max Dog Dachshund, Standard Smooth Haired Chihuahua, Short Coat 98115

Creating the word clouds

Below is a function that will pull out the data for a user given Primary Breed and return a word cloud. In this function I illustrate another way to slice the data using the pandas str.contains() function over the Primary Breed coulumn.

This has the benefit of letting us slice the data with very specific or more general breed names (e.g Terrier vs Terrier, Jack Russell). Using the str.contains() function, it is easy to separate or combine the two related breeds.

In [9]:
all_terrier_df = pet_data[pet_data['Primary Breed'].str.contains('Terrier') == True]
jack_terrier_df = pet_data[pet_data['Primary Breed'].str.contains('Terrier, Jack Russell') == True]
all_terrier_df.tail()
Out[9]:
License Issue Date License Number Animal's Name Species Primary Breed Secondary Breed ZIP Code
65987 December 29 2016 S125242 Walter Dog Terrier, Soft Coated Wheaten NaN 98103
66009 December 30 2016 S125248 Sousa Dog Terrier, Boston NaN 98117
66016 December 30 2016 S125203 Buddy Dog Terrier, Russell NaN 98177
66018 December 30 2016 S125221 Benjamón Dog Terrier Chihuahua, Short Coat 98109
66039 December 05 2016 S101614 Sammy Dog Terrier Maltese 98105
In [10]:
jack_terrier_df.head()
Out[10]:
License Issue Date License Number Animal's Name Species Primary Breed Secondary Breed ZIP Code
90 August 11 2013 951542 Mickie Dog Terrier, Jack Russell NaN 98105
282 August 11 2013 951542 Mickie Dog Terrier, Jack Russell NaN 98105
809 April 03 2014 140471 NaN Dog Terrier, Jack Russell Mix 98119
825 April 12 2014 214579 Sammy Dog Terrier, Jack Russell Basset Hound 98177
897 June 05 2014 356422 Jazzie Dog Terrier, Jack Russell Terrier, Rat 98104

The breed_cloud function below returns a word cloud based on a user-given breed name. At a minimum, the user must provide a pandas dataframe containing the pet license data and a breed name. The user may optionally pass this function breed2 and mask arguments. The breed2 option is used to provide a Secondary Breed (e.g breed = 'Terrier, Jack Russel',breed2 = 'Terrier, Rat'). The mask argument is a boolean. If mask is set to True, the user must also provide the file name of the mask_image. This option is used if you want to change the shape of your word cloud based to match the given mask_image.

In [11]:
def breed_cloud(all_pets_df,breed,**kwargs):
    try:
        if 'breed2' in kwargs.keys():
            breed_df = all_pets_df[(all_pets_df['Primary Breed'].str.contains(breed) == True) & 
                                   (all_pets_df['Secondary Breed'].str.contains(kwargs[breed2]) == True)]
        else:
            breed_df = all_pets_df[(all_pets_df['Primary Breed'].str.contains(breed) == True)]
        breed_names = list(breed_df["Animal's Name"].dropna())
        all_name_string = ' '
        for i, name in enumerate(breed_names):
            all_name_string += str(name)+' '
        try:
            if kwargs['mask'] == True:
                try:
                    mask_image = kwargs['mask_image']
                    breed_mask = np.array(Image.open(mask_image))
                except KeyError:
                    print ('If mask is True must provide mask image')     
                wc = WordCloud(background_color="white",max_words=100,
                max_font_size=75,mask=breed_mask).generate(all_name_string)
        except KeyError:
            wc = WordCloud(background_color="white",max_words=100, 
                           max_font_size=75).generate(all_name_string)
        return wc
    except ValueError:
        print ('Maybe that breed is not in the data?')

For our first example, let's make a word cloud for the Dachshunds

In [12]:
wc_dachshund = breed_cloud(pet_data,'Dachshund')
plt.figure(figsize=(10,10))
plt.imshow(wc_dachshund, interpolation='bilinear')
plt.axis("off")
plt.show()

Unsuprisingly, Oscar is a popular name for Dachshunds. Given that Dachshunds have such a distinctive profile, they also provide a good example of the mask option.

In [13]:
wc_dachshund = breed_cloud(pet_data,'Dachshund',mask=True,mask_image='dach.png')
plt.figure(figsize=(20,20))
plt.imshow(wc_dachshund, interpolation='bilinear')
plt.axis("off")
plt.show()

As another fun example, let's look at the differences between American, French, and Engish bulldogs

In [14]:
wc_american = breed_cloud(pet_data,'Bulldog, American')
wc_french = breed_cloud(pet_data,'Bulldog, French')
wc_english = breed_cloud(pet_data,'Bulldog, English')
In [15]:
plt.figure(figsize=(25,25))
ax1 = plt.subplot(131)
ax2 = plt.subplot(132)
ax3 = plt.subplot(133)
ax1.imshow(wc_american, interpolation='bilinear')
ax1.set_title('American Bulldogs')
ax2.imshow(wc_french, interpolation='bilinear')
ax2.set_title('French Bulldogs')
ax3.imshow(wc_english, interpolation='bilinear')
ax3.set_title('English Bulldogs')
ax1.axis("off")
ax2.axis("off")
ax3.axis("off")
plt.show()