PyClist: Introduction the Powerful Craigslist Seacher

Introduction

I am finally coming out and going to talk about an Open Source program that I am writing that I think is going to benefit a lot of very diverse individuals. It is a Python module/application called pyclist (Python Craigslist). This is a programming library and group of applications that allow for automated searching and filtering of Craigslist listings. I am sure you already know what Craigslist is so I won’t get into that but I will discuss some of the practical limitations to using Craigslist to its fullest potential.

The Problem

Craigslist is used by people to sell all sorts of goods and services. If you are a homesteader like myself there are always items that you need and are “in the market for”. The major problem with Craigslist, and many other classified providers is you have no way to setup long standing searches and customizable filtering to really make use of the service. What ends up being the case is that you use Craigslist mostly for the things you are looking for right now, and not the things you’re in the market for. Craigslist, as with any other search engine, requires you to actively conduct the search, use multiple search phrases, conduct searches on multiple cities (if you want something that may not be local) and in general use a lot of time up. What I have learned over the past 2 years is that the majority of those actions are indicative of a need for computerized help.

The only thing that a human really needs to do is make good selection of search words, and the read the actual titles and listings to choose which one you want to look further into. A computer can do the searching for you (on a schedule), can remove duplicate entries, filter out bad entries, and search across multiple Craigslist cities. This is precisely what pyclist does. There has been too many times where we’ve been looking to get some item, but it just doesn’t require enough attention to spend the time continually looking on Craigslist. As homesteaders we are always in the market for things like tools, ball jars, potting materials, compost, bedding/straw, animals, garage sales, lumber and other things. Imagine how much time it takes to search for all of those, and do a pretty good job at it. You end up waiting till the last possible second, and you could have missed good opportunities at the price you were looking for. That was the reason why I created this program.

Features

Here are the main features of pyclist. All of these features are currently working.

– Perform searches straight from your computer
– Allow searches to be done automatically
– Conducts multiple searches using any number of different phrasings
– Conducts searches across multiple cities/locations
– Removes all duplicate listings between multiple searches
– Allows for receiving only the “local” cities matches, even across multiple city searches
– Allows filtering on either the header information (what is seen first when you search) or even the listing itself, so that common trickster entries, or common word usages can be removed
– Pulls out the phone number and puts it in an easy to read format (Craigslist modifies the numbers, I guess to beat automated tools, unfortunately for them, my code works better)
– Caches and phases out old results, for quick repeat searches.

How it works

Right now the application is purely a command line tool, so all the Linux/Unix guys out there will definitely be quite content with that. It is built using Python3 currently with dependencies on httplib2 and BeautifulSoup, so the program is completely cross platform. It will run on any operating system that can run Python3 so eventually it could be on mobile devices, but that would be some time in the future.

I will be building a GUI here soon for the application to bring it to market so that any average person will have full access to this powerful searching capability. Since the underlying code is designed as an API, the graphical user interface (GUI) will just be a “skinned” version of the command line, with many of the options and capabilities being a bit more physically fleshed out, meaning that it will be extremely easy for anybody to use. In fact I intend on doing some of it as an “improved” craigslist interface.

In Action

OK, here we go. I am going to show all sorts of screen shots with me doing different things, from the basic to complex. If you aren’t familiar with a command line interface for running programs don’t worry, you will never have to get involved with this since I will be making a graphical interface for it as well, I just want to show that this program does indeed exist and is extremely powerful and flexible.

A Simple Search

Lets do a simple search for ball jars.  I will use the option -s (state) for Louisiana, I will use the city “New Orleans” and I will search for the term “ball jars”. Note below in the actual search, how it shows the “non local” results, while the command line version just showed the local ones (ignore the fact that the non-local are the same as the local one… haha, this person really wants to sell these jars apparently).

simple_search

Ball Jars in New Orleans, Louisiana with the -l command line, which provides only shows the LOCAL results.

simple_search_real

This is what that search looks like on Craigslist.

A “Full Search”

Now I will do a “full search” (the -f option). This shows the full contents of all of the results.

full_search

The same search as above, but it shows the content of the pages as well as the header information.

full_search_real

This is what the full page looks like in Craigslist, as a reference.

Search an Entire State

Here I will do another search, but this time instead of using “New Orleans” i use the word state, which tells the searcher to search every city inside Louisiana. (Note: Using the -l (local) option is a pretty good match since looking you’re already looking at the entire state)

state_search

Here I am searching for an excavator in all of Louisiana.

grep_1

Here is an example of chaining the output to another tool. I am sending the output to grep to find only posts in October. I do another search passing the results of that to wc (word count) and note that there are 23 results (the far left number).

In the above example I do a little Linux/Unix command line trickery to get results that I’m looking for. I just wanted the October results and then I passed it to a tool that can count the lines, telling me there are only 23 Excavator postings in all of Louisiana for the past 6 days.

Configuration Searches

Now this is where the program shines through. The command line searching capability will definitely get you off the ground, but the configuration searches bring out the maximum capability of the program. The way configuration searches work is that there is a default_searches.ini file that provides the base configuration for how you want to do your searches. These can/will be overrided in the individual search files. This saves you the time of continually having to put in the same search information in (city, state, country) but allowing you to make a change depending on what you’re looking for.

In this particular example I am looking for a Great Pyrenees and I want to be able to search all of Louisiana for one (and only Louisiana) but also search Gulf Port, Mississippi since it is actually quite close to me. I am also interested in “live stock guardian” dogs in general so I want to search LGD as well. This first picture will show the two configuration files.

config

Note how in the Great Pyrenees configuration file. This allows me to chain together multiple searches under a single purpose of looking for a Great Pyrenees. You could select various cities, multiple search words, and different configurations.

I also have filters in place that allow me to remove out bad results using those keywords. It so happens that there is a location somewhere around New Orleans that has an abbreviation of LGD, so if I ever search LGD (for live stock guardian dog) I will get those results, this way I can filter those out. Lastly I will point out that the way the application works you don’t have to full type out the names of city or states. Effectively it matches cities by seeing if the word matches the beginning of the name (so mississ in this part will match mississippi, and gulf will match gulf port).

OK, now that we have our searches in place, lets run them. In this first scenario I am using the -i command which will allow you to select exactly which search you want to use. You can have as many search files as you want, and are treated as separate “searches”. Effectively this way all of the searches conducted in one file are treated as though it was one big ol’ complicated search. (Even though behind the scenes its doing dozens or more of searches).

config_search

Here are my results for the greatpyre.ini file. It looks like there are 4 listings in all of Louisiana that are about great pyrenees dogs. (The last is clearly not a good match…)

Here I will do the same search again but with the -f (full search) option so you can see the full content of the searches, back to back). Note the Phone Number right on top so you can easily find the phone number.

config_search2

Full Config Searches show which search word made the match, the header line, and the full content of the search, with the phone number right on top. I find this view really useful for scanning quickly what I might want. The GUI version will have pictures.

Showing Cities

There are some other helper utilities that query Craigslist for information that can be used to do better searches, and just information in general. There is the ability to show all the Cities/Listings that Craigslist has, and can be used as arguments for the program.
In this first example, I am printing them all but using the “head” command to only show the first 20 entries so you have an idea what it looks like.

cities

An example of the print_cities output.

Here I am narrowing down the search by using the -s (state) option and passing in Louisiana. This will show you only the pages associated with that state. Obviously these are the values you can use for cities when doing your searches.

cities_2

These are all the Craigslist locations in Louisiana.

In this example I am printing every City/Craigslist in the world out, then I am looking only for lines that start with — (those are the cities), and then I pass it to wc (word count). This shows me how many Craigslist pages there are out there! Cool!

cities_count

There is a ton of research one could do with this tool. I mean this would be a good trivia question, and I answered it in literally three chained commands.

Showing Categories

Another utility is showing the Categories available for the Craigslist that you want to search. Normally all searches are done with “sss” (for sale), but there is no reason you couldn’t search the community or services sections either! The first screenshot shows all available Categories for New Orleans.
categories

In this screenshot I am choosing the section I want to look at, and in this case the jobs section.

categories_jobs

Conclusion

Well that is the introduction to this new tool that I am developing. Other than the GUI its pretty much almost done. For you computer guys out there I am looking at doing the GUI using PyQT (QT5), but I’m not sure what other people think of that. It could even be the fact that I might make the GUI using an entirely different language all together, that mostly has to do with how irritating PyQT is with respect to doing a cross platform situation (Windows, Linux, OSX).

One component that is not built yet (although any Linux/Unix power user could make their own),  is a scheduling interface that would allow you to set it up to do these searches without you even entering commands. (A Linux user would know about “cron” tasks, which could effectively do this without building any new tools). Depending on the time sensitive nature of the search, and how often you want the latest matches, you could do it every few hours, or perhaps only once a day or even just once a week. Maybe you’re just wanting a roll up of garage/yard sales in your area so a search every thursday that shows you whats coming up Friday and Saturday would be good.

I should also mention that it will be released with the GNU Public License (GPL), unless I have someone specifically request an LGPL version (and gives me a real reason to release such a version). That basically means that the software must remain open and free (as in freedom) regardless of who gets the software. While the software will be free, as in freedom, I will be coming up with a very reasonable pricing schema sometime in the near future. I may even consider giving a TSP members support brigade discount… who knows. Since many people I think are confused on how I will be selling free software, I’ll quickly explain what it means. It means that in order to get the latest version of the software, or certified version, from me you’ll have to pay a fee, which goes towards helping me spend more time developing useful software. After that, anything goes. If you want to give it to your brother, or mother to use (for free) great! If you want to share it with your friends or host it on your own server, that’s cool too. The only thing you can’t do is restrict what somebody else can do with the software when you give it to them (if you do it for money, or for free it doesn’t matter).

If you aren’t a computer person and a lot of these screen shots went way over your head, do not worry at all, this is just the behind the scenes look at the program and a testament that “yes, it is working and operational, I just need to put a pretty face on it.”. As soon as I get something going I’ll make an update with those screen shots, so people can tear them to shreds. Please let me know what you think and if you’re interested! Also any and allllll ideas, (to include wild and crazy ones) would be greatly appreciated. If this tool is for you, tell me what you’ll need it to be able to do, and perhaps how you might wanted to be notified of matches.

Well this break is over for me… back to development…

Cheers

3 thoughts on “PyClist: Introduction the Powerful Craigslist Seacher

Leave a Reply

Your email address will not be published. Required fields are marked *