In April 2020 the new movie about James Bond is released. It is going to be 25th movie about agent 007 in the official movie series.
There are, at least, two more movies about James Bond that are not in the official series:
25 movies is a serious number. I've decided to get raw data about that movies and look at it. My original idea why I've started all this is to find out (at last) what actor is the best James Bond.
To parse and analyse data first I need data. What kind of data can I get about movie?
I use IMDb site a lot. So, I've decided to take rating and number of votes from that site.
The first thing I've done — the list of all canonical movies about James Bond.
IMDb allows to export that list as a csv file. There is a lot of information that I need it this file (title, year, rating, number of votes), but there is no info about actor who is playing the main role. Well, frankly speaking, for 25 rows it is pretty easy to add info about actor by hand, but I've decided to write a small script to get that data. So, from the original csv file I've used only movie ids:
$ cat imdb_list.csv | perl -F, -nalE '$F =~ /tt(\d+)\//; say $1' 0055928 0057076 0058150 0059800 0062512 0064757 0066995 0070328 0071807 0076752 0079574 0082398 0086034 0090264 0093428 0097742 0113189 0120347 0143145 0246460 0381061 0830515 1074638 2379713 2382320
With that ids and python library IMDbPY I've created Jupyter notebook that have created csv file with all the data that I needed.
IMDbPY library can return list of actors for every movie. I was sure that the first actor from that list is the actor who is playing Bond. It turned out that this is true for all the movies, expect the upcoming movie. First actor for the movie "No Time to Die" is Ana de Armas:
And I had to fix one cell in the data by hand.
So, I have raw data about James Bond movies. I put that data to Pandas Dataframe:
The very first thing that I want to do with this data is to find out when the movies were releases. It has passed more than 50 years since the first movie. The first movie was released in 1962. How often were movies released?
Here is a graph:
X axis is the year when the movie was released. On the Y axis is just some constant number that is same for all the movies.
After staring at this graph for some time I've decided that it is possible to group this movies this way:
There were a lot of actors who portrayed Bond, maybe that groups is the movies with the same actor? To check this hypnosis I've added movie titles and actor names on the same graph:
It turned out, that I was right and wrong at the same time. Movies with Daniel Craig and Pierce Brosnan are exactly the groups I've marked, but with other actors it is more complicated.
Sean Connery. At first there were 4 movies. A movie a year. Then there was a pause. Then one more movie with Sean Connery. One more pause. Then a movie with George Lazenby (he is the only actor that played James Bond only in one movie). And one more pause. And then another movie with Sean Connery.
Two more movies and then everything is working like a clock: a movie every two years. Such schedule wasn't changed even when Timothy Dalton replaced Roger Moore.
Then there was a pause (the most long one). And Pierce Brosnan. First 3 movies every 2 years, the 4th movie after 3 year.
And another pause and Daniel Craig. With completely irregular releases.
IMDb users can vote for the movies. They select number of stars that movie deserve. One star (horrible) to 10 stars (the best possible movie).
Then IMDb uses some complicated algorithm to calculate movie rating. It is complicated to prevent fraud voting. And to keep movie rating unbiased IMDb does not explain how the rating is calculated.
Here is the graph with IMDb rating of all 24 official movies about James Bond. (25th movie is not released yet, so I does not have rating)
And here is a graph how many people have voted for every movie:
It is interesting to see to Daniel Craig movies fluctuate.
If the trend stays the same the next movie should be a huge success.
It is interesting to see that on the votes graph movies with Pierce Brosnan and Daniel Craig are perfectly visible. And it is also visible that the movie "Moonraker" drop out with his rating.
I've started working with this data because I've came up with an idea how to find out what actor is the best Bond. Every movie has a rating. I take all the movies with the same actor and calculate arithmetic mean. Than number is the way to compare actors with each other.
And here are the results:
Using this method Daniel Craig is the best Bond. The average rating of all the movies where he has played Bond is higher than from the other actors.
Sean Connery is on the second place. The third place (and it is very surprising) is for George Lazenby, that has played Bond only in one movie.
But the most surprising thing for me that the worst Bond is Pierce Brosnan. And I completely disagree with that.
It is very interesting, but after the 25th movie the winner change change. It depends of the rating that movie will get. If the rating is 6.6 or lower, then the best Bond is Sean Connery. And if the rating is 6.7 or higher then Daniel Craig will stay on top.
It was announces that the 25th movie about James Bond is the last movie with Daniel Craig. But what is next? Will there be more movies about James Bond? It is unknown.
Anything can happen. It is possible that 25th movie is not only the last movie with Craig as Bond, but is the last movie about James Bond at all. A lot of thing have happened with James Bond during such a long history. But he has never died. Maybe this is the time to die? But his death does not mean the death of the movies. Heroes can rise. The other possibility that the next movie will show his life before that events.
The are other things that can happen. The next Bond is woman. Jane Bond.
It is also possible that the 26 movie will be with some other actor, but in 27 Daniel Craig will return.
The other thing that can happen is then somebody buys the rights to create Bond movie that is outside of the canon (Elon Musk?)
PS Jupyter notebooks and csv files that were used for this text are on github.
11 february 2020