Gus Katsaros

Hockey Analytics

print article archives RSS

Analytics Data Providers Q&A

Wednesday, November 7, 2018

Hail the data providers.


Detailed analysis produced in the public sphere by hobbyists and some exceptional talent headed towards team employment, couldn’t be accomplished without the fine work of another group of dedicated and articulate hobbyists.


Editor's Note: Rotoworld’s Season Pass is now available for the low price of $19.99. You get plenty of extra articles including the minor league report, the power play report and much, much more. Buy it now!


Don’t forget, for everything NHL, check out Rotoworld's Player News, and follow @Rotoworld_ HK and @KatsHockey on Twitter.


Data providers are often thanked and, almost always, credited – and to those that don’t properly credit data sources should be called out for such egregious behavior – but there is so much more than just getting a website up and running. That’s merely step one.


A few years back, I tried my hand at providing CHL data. Partnered with the work of a talented developer, we embarked on trying to build a self-sustaining website that would update extended CHL player data nightly. We quickly found out the logistics of maintaining the endeavor was immense. The work effort and maintenance costs were above and beyond anything a pair of hobbyists could provide. Support was fairly non-existent while data requests and display recommendations were plentiful – which really shows how persuasive data has become. Suggestions for further enhancements were overwhelming.


The site never made it out of beta.


I’d like to shine some light on some of the data providers that have made recent contributions in the public sphere in the hopes of encouraging some public support, and patronage. The work being done is exemplary, and since there’s a distinct need, having a variety of sources can be beneficial. Especially if an NHL team decides they’d like to internalize the sites (see @ExtraSkater).


I’ve often cited some of my most common used sites – including Hockeyviz and Corsica Hockey.*** My most widely used data is tracked by Corey Sznajder, who provides different sets of data or visualizations, each used for a different purpose. A lot of instances incorporate multiple data providers, used in coordination with each other.


For example, reading player line formation from Hockeyviz and then using available microstats from Sznajder’s tracking data, coupled with data from Corsica.Hockey can provide a robust enough data sample to perform advanced analysis and get a very good indication of a player’s performance with accompanying context.


*** Note: I engaged Corsica to participate, but didn’t receive a response. The site is a fantastic resource and one of, if not the longest standing site currently under operation after a reboot to Corsica 2.0. The site is administered by Emmanuel Perry.



Natural Stat Trick


I was working on the McKeen’s Hockey Yearbook at the time went dark. I tried to access the site only to receive an error message. The sheer panic considering the amount of incomplete work, engulfed me entirely. When the online world lost the pre-eminent site for With or Without You analysis (WOWY for short) due to founder David Johnson getting snapped up by the Calgary Flames, the sheer panic was overwhelming.


The immediate void was gargantuan, but fortunately for the online world WOWY’s Natural Stat Trick had already developed and included WOWY results on the site. Further site enhancements/functionality includes a line tool to isolate different combinations of players (up to five) on the ice. The stampede to find the next available site for this kind of analysis could have been overwhelming, but NST saved the day.


The site offers more than just simple WOWY’s and a description here won’t do the site justice for the functionality.


Brad Timmins, founder and proprietor of the site explained a little about how that process played out in real time. I asked Brad a few questions in regards to the site, its upkeep and maintenance challenges.


·        When Hockey Analysis went dark, this was the site that picked up the slack for WOWY’s. How did this flock of new users affect your ability to keep the site up and running?


The first sign I had of Hockey Analysis going dark was when I started getting server resource alarms. I was using most of the server's resources to add older seasons because there's pretty much no traffic at the start of August, right? By that point the site had already ground to a halt.


In the end I did have to move up to a more powerful server, which is fortunately quick and easy to do.


·        You’ve added new components to the site, including a variety of methods to download data. How has data accessibility requirements from users changed the way you’ve designed the site?


It's mostly about the little things, and recognizing the differences between what works for someone reading the site and what works best for someone doing work with the data. The eye and brain comprehend ice time as 6:37 much better than 6.617, but anybody who has tried to work with times in Excel knows the misery that can bring. So the site displays it as 6:37, but the copy and download buttons give it to you as 6.617. Things like hiding less-used columns by default to limit horizontal scrolling, but a button to download the entire table without having to un-hide all of the columns first.


·        Aside from WOWY’s, are there any other metrics that you plan on adding to the site? (e.g. WAR/GAR, expected goals).


The current plans are more incremental than adding entirely new metrics to the site. Adding metrics that can already be calculated from what is there, but not being done for you yet. Adding more options to the filters - I've had several requests recently for 6v5 and 5v6 options. I'd like to expand the line tool to let you do line vs line or line vs pairing opposition WOWYs. I'd like to add expected goals at some point too, but developing and testing a model or even implementing someone else's is a big job.


·        How do you manage costs associated with the site? Is there a Patreon page?


The first step is to keep the costs down, so I've put a lot of time into making everything run as efficiently as possible so it runs on the least expensive hosting possible. I mentioned having to move up to a more powerful server earlier, but it's still pretty low-end overall.


There is a Patreon for the site as well.


It has allowed me to add resources when they are needed, and even run a secondary server over the summer so I could make significant background changes without taking the live site down for maintenance.




The last point Brad made is more important than the statement makes it seem. Having a dedicated space to back up a production environment while performing upgrades, or integrating new code is crucial, especially with the method of adding incremental functionality.


Utilizing a dedicated production environment without any testing/acceptance environment, makes developing and promoting enhancements and code changes tricky and fraught with potential bugs – or downstream actions that cause problems with existing functionality.


Regression testing is important – testing to ensure newly added functionality didn’t cause a break or alter current functionality. Adding new functionality to the detriment of existing is just double the work effort.


Barlowe Analytics


Barlowe Analytics is run by Matt Barlowe and the through the Twitter feed (@BarloweAnalytic). There’s a unique quality to this ‘data provider’ because there’s no website. Data retrieval is conducted by a twitter query that returns values, or chart in a return tweet to the original user posting the query.


The two distinct elements to Barlowe Analytics include the Twitter bot, and maybe even more important, the tutorials on wide ranging technical subjects, encompassing programming languages and visualizations.


My favorite may be the SQL modules, due to its ease of use and structure, (select these records, from this data source, where these conditions exist). The rest is just syntax that’s easily learned.


The Barlowe Analytics ‘query bot’ has recently been promoting game probabilities prior to matches and recap data as well. Matt has also developed his own expected goals methodology here – including code to allow users to create their own.


Isolating the bot, it contains individual features, including:


League Leaders:



Game recaps:



Playoff Probability:



Game Situation Probabilities:



Data providers are becoming more widely accessible, more sites, including, but this is a unique option for data retrievals when internet usage is restricted, or for quick, easy answers. For a quick reference with a few variables returns immediate results. This functionality can be of benefit to a wide variety of users, ranging from a twitter user, analyst, or someone at the game with limited data to load heavy sites. Using any application, a user can get a quick chart, or stat.


A primer on how to use the Twitter bot is below.



  • What was the inspiration for the tutorials?

For me the inspiration was that I remember how difficult it was when I was starting to learn analytics. There’s a lot of material out there but almost none of it           pertains to hockey or if it does it’s at such a high level that it would just be unintelligible for a beginner. So I wanted to give people tools to help them get started and make it easier for them than it was for me

  • In creating these tutorials, did they open any new avenues or introduce any ideas that you hadn’t considered in the past?

Maybe not the tutorials themselves but you get a lot of people asking questions or showing you stuff they’ve worked on that you can learn from.


The Barlowe query bot is a unique feature and perhaps the first – or at least one of the first – to make an appearance on Twitter.


  • Was the intent here for quick data retrievals?

Yeah I believe I came up with it when Corsica was down or in the midst of its new implementation where it didn’t have all its old features up yet. One of my favorite things build on that old site was the rolling average graphs of certain stats over a time frame. I basically wanted to provide people a way to get those but I didn’t want to go to all the trouble of building a website mainly because it’s not something I’m very good at. But yeah, I wanted people to be able to get quick data on players and teams while just on their phone. 

  • Have you been able to measure public usage of the bot to date?

I can see the notifications it gets when people tweet at it but that’s about all I use to measure its usage. There are a few people that really seem to enjoy the feature.

  • Is there an intent to expand the bot functionality?

Yeah I want to add some more stats, mainly relative teammate statistics, to it that I’m building on my new database. I also want to change the syntax of the queries to make it a little easier to use and include the extra seasons from my new database. It’s currently still running on the old database which only goes back to 2015 I believe.

 ·        Do you offer personal tutorial services for anyone interested in furthering their knowledge and/or functionality?

Yeah if anyone wants to set up private tutoring sessions I’d be more than happy to help accommodate that for a modest fee. My main focuses are Tableau, Python, R, and SQL. I also know a fair bit about some AWS services as well.

  • How do you manage costs associated with the site? Is there a Patreon page?

No I currently do not have a Patreon and as of right now everything I do is free and the plan is to keep it that way. I won’t rule out asking for money in the future if things get too expensive but there are no plans to do so at the moment


Evolving Wild



It’s one thing to create, write about and provide data on new(er) metrics, but there’s more involved than that. The Evolving Wild page is built by a set of twins – applicable that they hail from Minnesota, but this isn’t a baseball blog – that knocked the online world for a loop when they announced how their twitter feed was run.


Josh and Luke have built this site and have been public proponents of the GAR/WAR debates. The 2018 Twitter WAR debate is a prime example. There’s a level-headed intent in their debate and a feedback loop they administer on their own.


Over this past weekend, they attempted a cool feature by trying to show just how their model interprets plays and the numerical values assigned to their expected goals model.


This is absolutely great.


Constantly testing and refining models strengthens the final results, adding versatility as new knowledge and technology overwrite the past learnings. Developers and analysts can do this privately behind the scenes, tweak and release the new data, without any public transparency.


Here, the twins did it in a public setting offering the ultimate form of transparency. The effort spawned some requests from other Twitter users.


The first thread is here:



And an additional thread:



Here’s a Q and A with Evolving Wild.


Q: The site looks great and the functionality for users is slick and easy. Was there an inspiration for the design?


A: Thank you! But to be honest, not really. The site is made using the R programming language's Shiny package, which comes with built in themes. We've made some modifications to the CSS styles and have been very aware of the overall user experience (table layouts, charts, etc.), but the actual design of the website is essentially a stock theme.


Q: How long did the planning and coding take for the effort before the site went live?


A: In terms of actually writing the code to make the website, I'd say it's probably taken us

continue story »
Gus Katsaros is the Pro Scouting Coordinator with McKeen’s Hockey, publishers of industry leading scouting and fantasy guide, the McKeen’s Annual Hockey Pool Yearbook. He also contributes to popular blog ... he can be followed on Twitter @KatsHockey
Email :Gus Katsaros

Highest Searched Players over the last 7 days

Video Center

    Player News: Week 17

    Player News: Week 17
    Matchups: Gordon, Jeffery

    Matchups: Gordon, Jeffery
    DFS Analysis: Williams/Coleman

    DFS Analysis: Williams/Coleman
    Dose: Cam

    Dose: Cam's Season Ends
    DFS Analysis: GB

    DFS Analysis: GB's Williams
    Dose: Cam Shut Down

    Dose: Cam Shut Down
    Dose: Lindsay Makes Pro Bowl

    Dose: Lindsay Makes Pro Bowl
    Dose: Gordon Eyes Wk 16 Return

    Dose: Gordon Eyes Wk 16 Return