How do your Summerize a lot of data? And what biases do you bring to that work?

Although, it could be argued that the infographic "The HISTOMAP four Thousand Years of World History" by John B. Sparks 1931 shows a number of biases that we might find objectionable today.  It is interesting to see an attempt to summarize a really "BIG DATA"  set.

What is the biggest data set you have tried to summarize?  What biases do you bring to that work? What do you try to do to reduce your biases or at least make them clear to your viewers?

I became aware of The HISTOMAP in this article Histomap: Visualizing the 4,000 Year History of Global Power By Nick Routley

--Tom

Parents
  • Here is another approach. To summarizing 2000 years of history?

    Does this do any better a job?  What are the biases here?  Who's missing?

    I discovered this in the article 2,000 Years of Economic History in One Chart By Jeff Desjardins.

  • I don't know that either of these do a better job than the other. They're answering different questions within the umbrella of summarizing all of world history. The GDP breakdown is more quantitative, which makes it easier to understand in some ways -- and potentially less subjective, if you keep in mind that it's only answering the history of their GDP. The Histomap is much less focused on actual numbers - there's no indication at all of how the decision was made to design the layout, but it does have the additional layer of context that I personally really like (though it is hard to read, even in high resolution). Both of these are focused on established countries/empires, which ignores scores of other kinds of history (nomadic peoples, religious power, artistic contributions, inventions, etc.) Summarizing the whole of the world's history is too big a question, I think.

    (As for my own experience with large datasets, I've not done much in the way of truly big ones. Regardless, I think explaining the assumptions made in creating any analysis is vital for continued legibility. Defining terms clearly and succinctly and expressing where things might skew greatly improve understanding - and also help me identify where I need to adjust my own process.)

  • Here is an even bigger map.

    https://www.visualcapitalist.com/wp-content/uploads/2017/11/histomap-big.html

    What do others think?  Does this have anything to do with the kind of data analysis we do?

  • The data assumptions make me very twitchy but from an aesthetic POV the colour scheme and labeling is effective.  I'm always a bit curious as to a semi log axis. At what point is it showing trends v prestidigitation.  

    I kinda want to do a histogram on customer segments (subscription, single tickets etc) by year now! At a PAC you could do one on genre attendance by year which would be interesting for Artistic Programming.

  • ,

    When I first saw your post I was wanting you to say a bit more about this?

    "I'm always a bit curious as to a semi log axis. At what point is it showing trends v prestidigitation."

    And then I looked more closely at the second GDP access and now see what you mean,  The Calendar on the X axis is strange.  Hiding a lot of potential variability in the X-axis.  Particularly for the years 0 - 1820,  which covers only the first 1/4 of the graphic.  

    The Y-axis is also likely covering a huge set of variabilities in the absolute numbers by showing Percent of the whole economy rather than absolute numbers.

    What do we think, Is this lying with statistics?  Or does it help to make a point?  I clearly missed the time compression on the X-Axis to start.

    I know that I like to look at sales curves early on in the sales cycle on a log scale because it tends to allow me to see activities in the early sales on an event.  This is particularly helpful when I'm comparing them to past performances that have completed sales records.  If the Y-axis were to be linearly scaled to fit the total final sales you would miss the details in these early sales.

    For the graph that you want to do are you thinking about time on the X-Axis? What sort of grouping on time would you do?  daily, weekly, monthly, quarterly, annually?  Would you look at the sale date or performance date?  Where would you go on the Y-axis? would you do % of the total for the period?  Or go with absolute values?

    I'd love to see what you end up with?

Reply
  • ,

    When I first saw your post I was wanting you to say a bit more about this?

    "I'm always a bit curious as to a semi log axis. At what point is it showing trends v prestidigitation."

    And then I looked more closely at the second GDP access and now see what you mean,  The Calendar on the X axis is strange.  Hiding a lot of potential variability in the X-axis.  Particularly for the years 0 - 1820,  which covers only the first 1/4 of the graphic.  

    The Y-axis is also likely covering a huge set of variabilities in the absolute numbers by showing Percent of the whole economy rather than absolute numbers.

    What do we think, Is this lying with statistics?  Or does it help to make a point?  I clearly missed the time compression on the X-Axis to start.

    I know that I like to look at sales curves early on in the sales cycle on a log scale because it tends to allow me to see activities in the early sales on an event.  This is particularly helpful when I'm comparing them to past performances that have completed sales records.  If the Y-axis were to be linearly scaled to fit the total final sales you would miss the details in these early sales.

    For the graph that you want to do are you thinking about time on the X-Axis? What sort of grouping on time would you do?  daily, weekly, monthly, quarterly, annually?  Would you look at the sale date or performance date?  Where would you go on the Y-axis? would you do % of the total for the period?  Or go with absolute values?

    I'd love to see what you end up with?

Children
  • Sorry I've been pretty hammered at work and just recovering from a migraine.

    Yep I think that Time on the X axis would be fine and read % audience of total Left to Right for each of the segments.  Absolute might be better for global changes in audience size parallel and underneath.  Quarters or years depending on the business.  Theatres etc with seasonal attendance (esp subscribers) would likely be better with annual.

    _____

    While I don't think it's necessarily lying the graphic is showing the story it wants to show rather than being something of statistical merit.  GDP is a fairly inaccurate metric with a huge amount of trade being undeclared or in accurately measured eg in global fishing

    Charts always simplify data to make a point.  Whether that is dishonest depends on the question being asked.  It's incredibly important to list ones assumptions prior to going on a data search as the distinctly colour the questions asked and what you might be measuring.

    That 20thC hump is really trying to say something about US v China as a percentage of global GDP so I'd eat my hat if there wasn't an agenda attached.  It doesn't mean it's not accurate given the conceits of the measurement but standing alone it's a bit quizzical. China's growth is given an insert in red and the "Rest of the world" is in very low colour contrast as if to be added as an afterthought.

    Also any measure of GDP prior to the 19thC is a bit handwavey at best.  Comparing the Histogram to the IMF graphic shows huge variations at 0AD.  And what does Ancient even mean?

    ____

    Please don't think I'm attacking the graphs (it's my inner socialist rising to the subject matter) - the examination of these is GREAT and really fun to pick at.  Aesthetically there is a lot of displays on the IMF collection that tell a particular story quickly and simply with a LOT of graphs (3) but doesn't seem too cluttered.

    Hope that helps and you don't wish you'd never asked Smiley

  • ,

    I'm super glad I asked the question.  You have demonstrated the kind of critical thinking about the topic that I try to do.  And that by asking the question I'm trying to invite others to consider.

    Thank you for your comments.

    Hope you feel better.

    --Tom

    What do others think?