Tom had previously posted a question about weather and attendance, looking at it from a forecasting attendance point of view. (https://community.tessituranetwork.com/topical_groups/analytics-coffee/f/discussions/22444/weather-and-attendance) Our question is more about tracking weather data in an effort to help analyze past attendance. Currently, the Museum tracks the temperature at a certain point of the day and an indicator of what type of weather was happening (cloudy, rainy, partly cloudy). I don't think this is quite what we need for analysis (i.e., we don't need to determine how partly cloudy and 73 degrees varies from cloudy and 69 degrees). My goal is to come up with a single indicator, preferably with a small number of potential values (less than 6) to help quantify if the weather was either a positive or negative factor in attendance for that day. Part of the key is to make the evaluation of what is the value for that day as objective as possible since this value will be recorded by different people of different days. (I'd prefer not to do a general "on a scale of 1 to 5, did the weather positively or negatively affect attendance today?) Is anyone else doing this sort of analysis? If so, how do you quantify the effect of the weather? Thanks!
You might want to check out Samuel Tran's section of this presentation from last year's conference:
https://www.tessituranetwork.com/Passthrough?itemUri=/tlcc/2019/Pres/06_20_Analytic_Integration.pptm
Available here under Quickfire/Tessitura Analytics Integrations:
https://www.tessituranetwork.com/en/Community-and-Events/Conference-Archive/tlcc2019archive#Materials
Thanks!
That's an impressive process Galen - thanks for sharing. I like the idea of having the automated feed. I'll be curious to see how your analysis goes.
My weather source is coming from Dark Sky (previously Forecast.io)
We are currently storing this data externally to Tessitura. I would love to get a summary value(s) into the database and then into Tessitura Analytics. However, I'm on RAMP so there are hurtles there. I'm just making it over those hurdles by getting a handle on the REST API.
We have not worked out the short term forecasts (10-16 day) and including them in our model as the MET seems to have worked out. Congratulation Galen Brown. I would love it if you can share more about how you have achieved this.
I'm wondering is some sort of clustering model might help pick the parameters and ranges for your weather categories. With 6 categories, I'm wondering if you would just end up with Winter, Spring & Fall, Summer, + 3 others. Carol Keeney what had you envisioned you might end up with for the 6 categories?
Like Tom, we've been using Dark Sky API to collect the weather data (temperature, chance of rain, precipitation intensity, humidity, etc) and store it in a data warehouse. This API offers an hour-by-hour forecast for the next 48 hours, and a day-by-day forecast for the next week.
To develop our model on Microsoft Azure ML Studio, we aggregated the historical attendance data and weather data based in 1 hour intervals. We also added other factors such week day vs week-end, holidays, etc. 70% of the data was used to train the model and 30% to validate it. The model is then used to predict the attendance based on the weather forecast. Like Galen, we used more than one weather indictor since we found out that we got better results with more than 4 or 5 features.
We are now moving the workflow from Azure to local python code to streamline the it and implement a feedback loop to continuously train the model. We are evaluating these models: Random forest, XGBoost, Gradient Boosting, etc.
Tom - we were thinking of something like - on a scale of 1 to 5, was the weather a positive or negative influence on attendance today. Each of the levels would have associated weather metrics. 1 might be: below freezing and/or rain/sleet/snow. 5 might be temperature between 45 and 85 and sunny. I was trying to do this in order to keep the analysis simple; if we were to use a machine learning approach, we wouldn't have to do this.
Samuel Tran,
What are you thinking about when it comes to the feedback loop?
Samuel Tran
When it comes to models we have mostly used the same list Random forest, XGBoost, Gradient Boosted Trees from the Python scikit-learn library. XGBoost has typically worked the best for this project for us. We have not moved to any of the Neural Network approaches.
Are you forecasting at the hourly level or at a daily level?
I was thinking about using the newly aggregated attendance and weather data to re-train the model every other month. We are storing the attendance and the weather for the past hour in the warehouse.