DMI Google Image Scraper to Clarifai Tags

Tag images from Google queries using the Clarifai service.

Uses the DMI Google Image Scraper
Requires an API key from Clarifai
Works as a bookmarklet
Downloads a CSV

GENERATE BOOKMARKLET

Tagging algorithm (model) more info	Confidence threshold	tags as list	tags as columns	raw JSON
GENERAL General purpose model.
Apparel Recognizes fashion-related items.
Celebrity Celebrities resembling detected faces.
Demographics Age, gender, ethnic group of found faces.
Food Recognizes food items and dishes.
Moderation Detects unwanted content: gore, nudity...
NSFW Identifies nudity ("Not Safe For Work").
Textures and patterns Recognizes common visual patterns.
Travel Travel and hospitality-related concepts.
Wedding Wedding-related concepts.

How to get a Clarifai API key?

HOW TO USE

1. Browse to the DMI Google Image Scraper. This tool simplifies the querying of Google Images.

2. Enter your query in the field titled "Key words", or follow instructions.

3. Click the bookmarklet and WAIT. Here your browser connects to Clarifai to tag the images. It takes some time; a CSV file will download when the tagging is done. Open the Javascript console for more detailed information during the process.

HELP

What is the purpose of the tool?

When you type a query in Google Image, you get a list of images. This tool allows you downloading this list for any number of queries. And for each image on the list, it adds data from an image recognition service named Clarifai. The Clarifai data uses machine learning to identify elements of the picture: objects, but also faces and their demographic attributes.

The tool requires a setup process, but once it is done the data can be gathered in one click. The setup can be fully done in this page. It requires you get a Clarifai API key, a personal identifier that allows you to get data from the Clarifai service. In the end it generates a bookmarklet: a mini script embedded in a bookmark. To use it, you just have to type your query in a certain page and click on the bookmarklet to download the data.

How to get a Clarifai API Key

Sign up to Clarifai to get your API key. Just follow the instructions. It does not require any payment or card number. But beyond the first 5000 images per month, it will stop working unless you pay for it.

Why an API key? Clarifai tags your images with machine learning techniques. You send it images via the web and it responds with tags. The "API" is the door to the service: it has both an address and a lock that requires a key. The key is personal, and Clarifai uses it to monitor your use. Indeed the service is only free up to 5000 images a month. This tool knows where the address is, but you need to tell it your key. The resulting bookmarklet will only visible to you, so no one will get your API key. You can go to Clarifai to know how many of your monthly free queries are used.

How to use the settings?

Pick the tagging model(s) relevant to you. See list below. The bookmarklet you generate will use the specified Clarify models and data formats.

Each model you use spends an API call per image tagged so the more models you use, the faster you reach the limit of 5000 free API calls per month. Example: you use 3 models, and you tag 100 images. It uses 300 API calls each time you run the bookmarklet.

Three data formats are available. Each corresponds to a different need, and you can pick multiple. Two of them require a threshold, a number between 0 and 1 that you can set.

1. Tags as list. The easiest data format. Concepts with a confidence score above the specified threshold will appear as a list of tags in a single column. Example:

Image	Concepts
A	fashion, business, leather
B	leather, retro
C	business, retro, coffee, shopping

2. Tags as columns. Quite easy but more rich. Concepts with a confidence score above the specified threshold will appear in multiple column. Each tag has its own column. Example:

Image	Fashion	Business	Leather	Retro	Coffee	Shopping
A	1	1	1
B			1	1
C		1		1	1	1

Important note: "tags as columns" can count multiple tags for a same image. This happens for models that can recognize multiple items per image. For instance the model "Demographics" can find multiple faces in a single image, and tag them all. If you have three feminine faces in an image, the column "feminine" will have the value "3" for that image.

3. Raw JSON. This format is the harder to use but the most complete. It will simply store the information answered by Clarifai in a single column. The data is formated as a JSON, structured as a tree. It is therefore very hard to use in a spreadsheet, but easy to use in a script; but it is a good way to log the results for further use. Note: it also contains concepts under the confidence threshold. Example:

Image	Concepts
A	{"concepts": [ {"id": "ai_GC6FB0cQ", "name": "fashion", "value": 0.99863684}, {"id": "ai_fBH5DFMJ", "name": "business", "value": 0.9962599}, {"id": "ai_2KV5G1Fg", "name": "leather", "value": 0.97945905}, {"id": "ai_XN1QLhwp", "name": "retro", "value": 0.27526324}, {"id": "ai_KWmFf1fn", "name": "coffee", "value": 0.1743866}, {"id": "ai_GC6FB0cQ", "name": "unicorn", "value": 0.0054384} ]}

Image

Concepts

{"concepts": [
  {"id": "ai_GC6FB0cQ", "name": "fashion",  "value": 0.99863684},
  {"id": "ai_fBH5DFMJ", "name": "business", "value": 0.9962599},
  {"id": "ai_2KV5G1Fg", "name": "leather",  "value": 0.97945905},
  {"id": "ai_XN1QLhwp", "name": "retro",    "value": 0.27526324},
  {"id": "ai_KWmFf1fn", "name": "coffee",   "value": 0.1743866},
  {"id": "ai_GC6FB0cQ", "name": "unicorn",  "value": 0.0054384}
]}

What are the different algorithms available?

Clarifai proposes multiple algorithms, or "models". Each is trained differently, and recognizes different concepts. Some are more specialized (NSFW only tells if an image is "safe for work" or not) than others (GENERAL recognizes 11,000 concepts).

Refer to the Clarify Model Gallery for complete information, or look at the summary below.

GENERAL. General purpose model. Recognizes over 11,000 different concepts.
Examples of concepts: Afternoon Art Beautiful Bicycle Happiness Togetherness

Apparel. Recognizes fashion-related items.
Examples of concepts: Blouse Bracelet Casual Dress Fleece Jacket Loafers Pant Suit

Celebrity. Identifies celebrities resembling detected faces.
Examples of concepts: Marilyn Monroe Ice Cube Jennifer Lopez Angelina Jolie Jake Gyllenhaal

Demographics. Predicts the age, gender, and cultural appearance of detected faces.
Examples of concepts: 18 94 feminine masculine asian black or african american

Food. Recognizes food items and dishes.
Examples of concepts: Apple Avocado Bread Ice Cream Sandwich Steak

Moderation. Recognizes unwanted content: gore, drugs, nudity.
Examples of concepts: Gore Drug Explicit Suggestive Safe

NSFW (Not Safe For Work). Identifies nudity: "safe for work" or "not safe for work".
It uses only two concepts: NSFW (Not Safe For Work) SFW (Safe For Work)

Textures and patterns. Recognizes common visual patterns.
Examples of concepts: feathers woodgrain petrified wood glacial ice veined metallic

Travel. Travel and hospitality-related concepts.
Examples of concepts: Balcony Beach Breakfast Buffet Casino Kids Area Restaurant

Wedding. Wedding-related concepts.
Examples of concepts: Bouquet Bride Cake Ceremony Flowers Groom