Site Sponsors:
Collecting Wikiquote Data Using Python 
You've 'gotta love collecting quotes - not only might they teach us, but reviewing quotations revered by others helps us better understand what motivates today's majorities.

Like many others, I also love Python 3. Not only is Python 3 finally ready for prime-time, but - from gainful employment to games - Python's community is simply the most amazing set of programing enthusiasts in our modern world. -If you want to do something, chances are extremely good that someone has a package that can help you do things ALLOT quicker.

So it was with collecting Wikiquotes!

Quotes Matter


I have been collecting quotes since my college days. Indeed, from then to prior to to-date I have amassed a collection of around 100,000.

When it came time to snoop around Wikiquote therefore, how could any 'quotie worthy of the moniker NOT try to collect 'em all, as well?

So as I sat down to "learn something" on this traditional occidental day of rest, I decided to give the wikiquotes package a try.

After pip'ing it down, here is what I came up with:

import wikiquotes

alpha = "abcdefghijklmnopqrstuvwxyz1234567890"
major = 1
minor = 1
with open("./wikiquote_2017_10_22.txt", "w") as results:
for char in alpha:
try:
result = wikiquotes.search(char, "english")
zlist = list(result)
for author in zlist:
print(char, major, author)
quotes = wikiquotes.get_quotes(author, "english")
for quote in quotes:
if str(quote).find("proposed by") == 0:
continue
if str(quote).find("(UTC)") != -1:
continue
print("tbd", char, minor, major, author, quote, sep='|', file=results)
minor += 1
major += 1
except:
print("error", char, minor, major, "error", "no quotes", sep='|', file=results)

Using the above, we were able to download 17,068 things to review. The fact that we have an even set of 360 'authors' (10 per) clearly indicates that I did not get 'em all the first time 'round... but I eventually got the vast majority [5,225 topics? 153,621 quotes?] of them... (*)

Quality Comments


Overall, I should note that I was disappointed with the quality of the quotations. While there were some decent citations that I did not have, allot of the jibes seem to be far too fatuous; desperate attempts to garner cheap publicity for far too many unmemorable nouns. More than a few pages have absolutely no quotations on them at-all.

Yet - as mentioned previously - as we 'quoties seek to separate the gold from the gall, over time history has an annoying tendency to insure that only the strong, will survive.

Enjoy the journey!

--Randall

p.s. If you would like to get the results of today's diversion, we just uploaded them to the Mighty Maxims Project.

(*) In order to keep the server load reasonable for our Wikipedia friends, I will keep THAT bit of code on my own 'local ... still sorting thru them! :)

[ add comment ] ( 33 views )   |  permalink  |  related link

<<First <Back | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Next> Last>>