Clearly, this blog has largely become a project blog for my analyses of reddit (particular /r/warhingtondc), so I’m going to stop starting posts by going into the background of the project.
Normally, I’ve just been posting results, but I want to get more into the habit of commenting on my whole process, so this is the only “brainstorming” post so far but will probably be the first of many.
Do people use reddit more when it’s shitty out. Can we show this analytically? Are there some very active users who are likely to stop redditing when it’s really nice out? Can we identify homebodies (people who’s redditing behavior is affected very little by how nice the weather is)?
1: Aggregate reddit usage BY DAY for r/dc. Might not have enough data to do something meaningful here with just r/dc, but we can try. Consider determining average usage for each user by day of week and normalizing relative to that mean. Then derive an average based on available users (otherwise we’ll always see more usage later in the dataset since there will be more users). Consider limiting dataset to users who have commented within a certain threshold time frame towards the end of the dataset to ensure we don’t grab any “dead” users.
2: download wunderground data for the appropriate time series. Try to draw predictions based on precipitation, temperature, etc.
One option: work backwards. Identify days during which it rained all day, during the afternoon only, etc (not just any day it rained). See how these buckets affect the analysis.