Jump to content
  • Welcome!

    Register and log in easily with Twitter or Google accounts!

    Or simply create a new Huddle account. 

    Members receive fewer ads , access our dark theme, and the ability to join the discussion!

     

Can Anyone Help with Finding Correlations Between Two Lists?


OnlyPantherFaninMaine

Recommended Posts

But the lists do not include any numbers, but rather rankings of all 32 NFL teams on various statistical metrics.

 

Using some team statistics from the NFL security meetings my boss provided me with this summer, I am looking to see if there are any concrete correlations between a variety of statistics and metrics. I have a list of all 32 teams and their fan base rankings on a wide array of measurables that include fan code of conduct awareness, fan behavior outside and inside stadiums, revenue at risk for a franchise, and fan awareness of a fan conduct textline.

 

I will then compare the rankings and stats for all of these and compare them with lists I have compiled on crime rates in all 32 NFL cities, the net worth of all NFL franchises, the average temperature of a city during football season, all-time team winning percentages, and the capacity of all 32 stadiums. The conclusion and goal of this little experiment will be to see if there are any definite correlations between any of these metrics and what I conclude about the league or specific teams from my findings. 

 

My problem is determining and carrying-out the proper methods necessary to find out if there are indeed correlations between these many lists that are only of team names ranked 1-32 but the team represents a numerical value in most cases. 

 

 

Link to comment
Share on other sites

Yep. You can do correlations in excel pretty easily. But really only two variables at a time.

 

So if you want to find the correlation between wins and temperature no problem.

 

And If you want to find the correlation between crime and wins no problem.

 

But if you want to design a model that determines which has a higher correlation with wins between temperature and crime that gets complicated pretty quickly (called generalized linear modeling).

 

In other words, what if you determine that wins are highly correlated with both high temperatures and high crime...is it a situation where high crime is also correlated with high temperature and has no bearing on wins?

 

Any of you stats guys correct me if I am wrong

Link to comment
Share on other sites

Problem is all my lists just have team names and are ranked 1-32 depending on whatever metric it is that I want to compare. Sure, the team name represents some numerical value most of the time but I don't necessarily have all the numbers at hand to compare using numbers. I essentially have to find correlations between teams and where they rank on a certain list just by the team name along. I was thinking just look at the top 10 and bottom 10 teams in each least and see how many are similar and different to determine if there is a strong correlation or not. I cannot think of any other way because I don't have numerical values to work with as much as I would like. 

Link to comment
Share on other sites

be careful of a huddle axiom which is confirmation bias. even in your correlation comparisons.

 

what is your sample size of info per each team as well as the data source?

 

metrics can be tricky and misleading in the sense of what is called subjective conclusion or reasoning.

 

maybe you can come up with something similar to Beta for stocks and can have plus or minus per the Beta and maybe that can help shed more light.

 

im decent at excel but prefer gleaming my insights from other methods and not "cheat" with excel data input. call me old fashion.

 

 

Link to comment
Share on other sites

be careful of a huddle axiom which is confirmation bias. even in your correlation comparisons.

 

what is your sample size of info per each team as well as the data source?

 

metrics can be tricky and misleading in the sense of what is called subjective conclusion or reasoning.

 

maybe you can come up with something similar to Beta for stocks and can have plus or minus per the Beta and maybe that can help shed more light.

 

im decent at excel but prefer gleaming my insights from other methods and not "cheat" with excel data input. call me old fashion.

 

Most of my source data is from my boss (head of security for the Patriots) in the form of graphs and tables and statistics that were compiled on each team during the most recent NFL security meetings that take place annually. I took five of the lists from those documents I thought were most pertinent to my studies and then thought of the metrics by which to determine/look for correlations on my own. Using open source data from the likes of Forbes, I came up with ranked lists (#1-#32) including every team/NFL city on seating capacity of the NFL stadiums, all-time winning percentage, crime rate of NFL cities, and average temperature of a city during the football season. I want to see if there is a correlation between those metrics I just listed and the lists I made using what I have from the NFL security meetings. However, there are no numerical values for some of these so right now I just have a bunch of lists that have all 32 NFL teams on them in a bunch of different orders because the metrics are so different. I'm not sure Excel would allow me to find correlations/relationships between lists that do not include numbers, but rather just words (in this case, NFL cities/teams). 

Link to comment
Share on other sites

Good you noticed no numerical values. So create your own value for those.

Then be ready to look beyond the #'s.

Like stadiums that have larger visiting fan base. Closer drive for fans or is it West Coast vs East or same state(ohio/ ny/ tex) or time zone.

Your source of data can give you just enough context then you can read between the lines as you take a step back.

Good luck

Link to comment
Share on other sites

Most of my source data is from my boss (head of security for the Patriots) in the form of graphs and tables and statistics that were compiled on each team during the most recent NFL security meetings that take place annually. I took five of the lists from those documents I thought were most pertinent to my studies and then thought of the metrics by which to determine/look for correlations on my own. Using open source data from the likes of Forbes, I came up with ranked lists (#1-#32) including every team/NFL city on seating capacity of the NFL stadiums, all-time winning percentage, crime rate of NFL cities, and average temperature of a city during the football season. I want to see if there is a correlation between those metrics I just listed and the lists I made using what I have from the NFL security meetings. However, there are no numerical values for some of these so right now I just have a bunch of lists that have all 32 NFL teams on them in a bunch of different orders because the metrics are so different. I'm not sure Excel would allow me to find correlations/relationships between lists that do not include numbers, but rather just words (in this case, NFL cities/teams). 

 

You would need to assign a numerical value to each list. You can either type them in or if you can dump the ranked name lists in you can then do a vlookup.

 

Then it's a just plugging in the correlation function below.

 

Given that you are only using values 1-32 (as opposed to the actual winning percentages, crime rates etc) you may not get as accurate a result as you like.

 

 

http://office.microsoft.com/en-us/excel-help/correl-HP005209023.aspx

Link to comment
Share on other sites

So each team/city will have a numerical value (between 1-32) that stays the same throughout this whole process? Right now my lists are ranked 1-32 with the greatest value always being the #1 but in every list the #1 team at the top is rarely the same. Would I assign the same number for every team throughout the whole experiment regardless of where they rank on the list?

 

Link to comment
Share on other sites

So each team/city will have a numerical value (between 1-32) that stays the same throughout this whole process? Right now my lists are ranked 1-32 with the greatest value always being the #1 but in every list the #1 team at the top is rarely the same. Would I assign the same number for every team throughout the whole experiment regardless of where they rank on the list?

 

Let's say your winning percentage list goes

 

Carolina 1

Falcons 2

New Orleans 3

 

and your crime rate list goes

 

New Orleans 1

Falcons 2

Carolina 3

 

Your two columns of numbers would be 

1  3

2  2

3  1

 

and you would use the correl function on those two arrays. 

 

Somebody correct me if I am wrong.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


  • PMH4OWPW7JD2TDGWZKTOYL2T3E.jpg

  • Posts

    • Ahhhh I love seeing teams worse than us. 
    • You can say what you want about Wilks, I know what I saw here , he took Rhules mess and turned it into a near playoff run  The Defense played better under Wilks, Sam played better under Wilks  He should have been the hire instead of Riech , who I was against because if he was another teams reject , like we had been failing with Rhule, but he didn't get the QB he wanted , and had Tepper undermining him at every turn, I would have gave up also  I think Brady and Evans torching us 3 times with the same play lost Wilks the job, but I put that on the sorry DBs we had at the time     
    • Did Bradshaw just predict that the Cowboys are gonna beat us when we play?
×
×
  • Create New...