Attendance tracking data - weather

Tom had previously posted a question about weather and attendance, looking at it from a forecasting attendance point of view.  (https://community.tessituranetwork.com/topical_groups/analytics-coffee/f/discussions/22444/weather-and-attendance)   Our question is more about tracking weather data in an effort to help analyze past attendance.  Currently, the Museum tracks the temperature at a certain point of the day and an indicator of what type of weather was happening (cloudy, rainy, partly cloudy).  I don't think this is quite what we need for analysis (i.e., we don't need to determine how partly cloudy and 73 degrees varies from cloudy and 69 degrees).   My goal is to come up with a single indicator, preferably with a small number of potential values (less than 6) to help quantify if the weather was either a positive or negative factor in attendance for that day.  Part of the key is to make the evaluation of what is the value for that day as objective as possible since this value will be recorded by different people of different days.  (I'd prefer not to do a general "on a scale of 1 to 5, did the weather positively or negatively affect attendance today?)  Is anyone else doing this sort of analysis?  If so, how do you quantify the effect of the weather?  Thanks!

Parents
  • I am partway through a related project for the Met Museum in NYC. We had been logging weather in a similarly haphazard fashion, and decided it would be smart to have more robust data, so here’s where we are now:
     
    My source of weather data is https://openweathermap.org/api
     
    Every hour, I have a job that calls the API to get current conditions, for use in analysis later on. We figured hourly data would be good, since conditions can change throughout the day, and in theory that could drive changes in visitorship throughout the day. For a summarized daily view, we can get the high and low easily from that data, and then I have two methods of summarizing the general weather conditions: One option is to return the condition that was the most common during the day. The other option is to show the most severe weather condition for the day (I put a severity column on the weather conditions category table, and assigned severity myself. We haven’t fine-tuned it yet.)
     
    Every morning, I have a job that calls the API to get the 16-day forecast. I load it into the table so that for every date you can see what the forecast was for that day 1 through 16 days in the future. The theory is that when we do the analysis, we will find interesting patterns in how much effect the forecast has certain numbers of days in advance.
     
    Those two processes were only deployed to our production environment two days ago, so we haven’t done any actual research yet. Once we’re sure that it’s running smoothly in production, we also intend to buy the historical data for both hourly conditions and the forecasts so that we can get started on historical analysis without having to wait to collect the data going forward.
     
    I realize that this is sort of the opposite of a single indicator for the day with 5 or 6 parameters, but I would argue that by having more robust, automated, standardized data (rather than relying on a human to make a judgment at a certain time each day) you could do analysis to figure out which factors are significant, and build your key indicators from there.
     
    -Galen
     
    --
    Galen Brown
    Senior Systems Analyst
    Information Systems and Technology
    212 650 2649

    The Metropolitan Museum of Art

    1000 Fifth Avenue
    New York, NY 10028
    @metmuseum
    metmuseum.org
     
  • That's an impressive process Galen - thanks for sharing.  I like the idea of having the automated feed.  I'll be curious to see how your analysis goes.

Reply Children
No Data