The FIFA World Cup has had a recent habit of introducing new media technologies into the sporting world. That’s not surprising given its status as a global spectacle. Given the many controversial aspects of this year’s event in Qatar, the introduction of the “semi-automated video assistant referee” is relatively mild. However, it’s an excellent example of the way in which emerging technologies create new spatiotemporal regimes.
Basically, the semi-automated video referee consists of 12 tracking cameras mounted around the stadium that track 29 data points on each player on the pitch along with the ball. There is an additional inertial sensor in the ball. With the use of computer vision and AI, images like the following are produced indicating whether or not a player is in an offsides position.
The previous World Cup had introduced the original video assistant referee which relied on pitch-side cameras and drawing lines across images to check for offsides. This system creates a 360-degree view cycling at 50 times per second (and an inertial sensor cycling at 500 times per second). As such a picture can be drawn at the moment the ball is struck (and its momentum shifts).
So that’s the technology in a nutshell.
Now, briefly about the offsides rule, which can be confusing for those that don’t follow the sport. Basically attacking players are offsides if a part of their body that can legally play the ball (i.e. not their hands and arms) is closer to the goal than the same body parts of the second to last defender. Usually that means there’s a goalkeeper plus one other defender. In the image above, the attacking player (in white) is offsides because their knee and toe are ahead of the last defender.
Until 2018 offsides was called by the assistant referees who run along the side of the pitch. They would have to determine if they saw a player offsides at the time the ball was played… and the ball might be played from the other side of the pitch. Obviously there was a fair amount of subjectivity in making those judgements. Instant replay made errors quite obvious to the TV audience so that’s part of why VAR was originally developed.
At the same, the semi-automated video assistant referee asks players to move their bodies through a mixed-reality space that they cannot sense. The offside above (and there are many examples that are this close from actual games) is imperceptible to humans in “real time.” Unsurprisingly, playing right on this offside line is an important tactical part of the game for both attacking and defending, and the players are quite adept so the decisions are often close. But what we have here is really a random event from the perspective of human agency. These are groups of humans acting collectively in the space of fractions of a second. In the image above if the ball is struck a few hundredths of a second sooner or later then the player might have been onside.
Of course this is “just” a sport (setting aside the massive amount of money changing hands at a World Cup). But these are technologies that are surely coming our way. The volumetric capture mentioned in the title is part of developing technologies of mixed and virtual reality. In addition to sports, arts, and entertainment, creating holograms of people have uses in many industries from health to suggesting clothing sizes. High end volumetric capture might involve a studio and more than 100 cameras, but more mundane versions of this already work with smartphone cameras (even though they aren’t 3d images since the image is coming from only one point). How might future security camera set-ups create a new legal regime based upon an analysis of nonhuman time and space? How might such data inform our self-image or our efforts to “optimize” ourselves according to some algorithmic construction of health or beauty or ethics?
Put more abstractly, how might these media shape our agency? In the Latourian sense of actors being those who are “made to act,” how might these technologies demand behaviors from us (for our own good of course)? How might they open new potentials for action by providing some interface with a new space and time?
One reply on “the volumetric capture of agency”
an interesting case.