Overview
- Understand SOTA for Object Detection.
- Build a tool for monitoring social distance run real-time on CPU.
How to use YOLO v5 for person detection
Two months after the release of YOLO v4, YOLO v5 has been released by Glenn Jocher, who implemented YOLO v3 by PyTorch. The similarities between 2 versions are CSP backbone, PANet, mosaic data augmentation. YOLO v5 doesn’t have any published paper, however, its efficiency has been proven. Despite lower accuracy, YOLO v5 has a much higher speed allowing real-time applications on CPU.
CSPNet divides the input features into two equal parts, one stays the same that is saved into a transition block, the other is given a dense block and a transition block. CSPNet connection helps to save information from the previous layer while reducing a complex model.

The deeper the network is, the more information lost. So, researchers have proposed FPN (Feature pyramid network) to detect small objects. PANet (Path Aggregation Network) is an innovation of the FPN that improves localised information on the top layers.

Mosaic data augmentation: each input image is a combination of 4 images. This makes the context of the image richer.

Install and Run code for people detection
# clone yolo v5 and run people detection
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt
There are 4 pre-trained checkpoints YOLOv5s/m/l/x, that version is real-time on CPU.
python detect.py --source file.mp4 --weights yolov5s.pt --conf 0.4

Camera Perspective Transformation
As the input video taken from an arbitrary perspective view, we need to convert the perspective of view to bird’s eye view (top-down view). As we have a single camera, the simplest transformation method is choosing 4 points that define RoI where we want to monitor social distancing and mapping them to corners of the rectangle in bird’s eye view. These points should form a rectangle in the real word if seen from above. This top view or bird’s eye view is distributed uniformly, horizontally and vertically(scale for horizontal and vertical direction will be different). From this mapping, we can derive a transformation that can be applied to the entire perspective image.

import cv2 import numpy as np def four_point_transform(image, pts): (tl, tr, br, bl) = pts widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA), int(widthB)) heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA), int(heightB)) dst = np.array([[0, 0], [maxWidth - 1, 0], [maxWidth - 1, maxHeight - 1], [0, maxHeight - 1]], dtype="float32") M = cv2.getPerspectiveTransform(pts, dst) # or use this function # M = cv2.findHomography(pts, dst)[0] print("angle of rotation: {}".format(np.arctan2(-M[1, 0], M[0, 0]) * 180 / np.pi)) warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight)) return warped # define 4 points for ROI def selectROI(event, x, y, flags, param): global imagetmp, roiPts if event == cv2.EVENT_LBUTTONDOWN and len(roiPts) < 4: roiPts.append((x, y)) print(x, y) cv2.circle(imagetmp, (x, y), 2, (0, 255, 0), -1) if len(roiPts) < 1: cv2.line(imagetmp, roiPts[-2], (x, y), (0, 255, 0), 2) if len(roiPts) == 4: cv2.line(imagetmp, roiPts[0], (x, y), (0, 255, 0), 2) cv2.imshow("image", imagetmp) print("select ROI") def main(): global imagetmp, roiPts roiPts = [] image = cv2.imread("images/2.png") imagetmp = image.copy() cv2.namedWindow("image") cv2.setMouseCallback("image", selectROI) while len(roiPts) < 4: cv2.imshow("image", imagetmp) cv2.waitKey(500) roiPts = np.array(roiPts, dtype=np.float32) warped = four_point_transform(image, roiPts) cv2.imshow("Warped", warped) cv2.waitKey() cv2.destroyAllWindows() if __name__ == '__main__': main()
Create a tool for monitoring
We need 7 points on the first frame: 4 points for defining RoI where we want to monitor social distancing. These points should form a rectangle in the real world if seen from above (bird’s eye view).The order of 4 points: top-left, top-right, bottom-right, bottom-left. The last 3 points define limit distance in horizontal and vertical. The order of 3 points: bottom-left, bottom-right, top-left.
Distance Calculation
After using YOLO v5 for person detection, we have a bounding box for each person and now we need to calculate the distance between two people in the frame.
Assume every person is standing in the same flat. First, we get center-bottom point of the bounding box and convert (x, y) points to bird’s eye view.
Second, looking at the illustration above, we define the limit distance (ex: the distance between two blue points is 2 meters) corresponding to width, height of RoI, and convert these blue points to bird’s eye view.
Last, we compute the bird’s eye view distance between every pair of people and scale the distance by the scaling factor in horizontal and vertical . If the distance is smaller than the defined threshold, these people are violating social distancing policy.
Output

Red, Green corresponds to the distance violators and non-distance violators.
Conclusion
In this time of COVID-19 pandemic, this solution provided through YOLOv5 is needed more than ever to scale up the enforcement of social distancing policy. So, this article is definitely a key leading you directly to the answer that everybody is looking for. YOLOv5 can be applied much more to our life than we ever imagine, which I will deliver to you in the next article. Stay tuned!!
As a member of BlueEye AI, we are deeply interested in collaboration and development. If you are looking to apply computer vision/ artificial intelligent to either your solutions or operations, contact us at [email protected]
At BlueEye AI, we aim to provide end-to-end production services (From data sourcing to application) to our clients.
Reference