Luxand - Luxand FaceSDK

Luxand FaceSDK – Tracker API

What is Tracker API

Tracker API is a set of functions that allows for recognizing subjects in live video streams. The API receives the video frame by frame, and assigns a unique identifier (ID) to each subject detected in the video. Thus, each subject can be determined by its ID across the video. You can attach a name tag to an identifier, and query any identifier for its name. The API also allows simple face tracking (without registering subjects); tracking of the coordinates of either all facial features or just eye centers; and recognition of subjects' gender, age and facial expression. The API provides an estimate of both recognition rate and false acceptance rate as the video progresses.

If your task is to track or recognize faces in video streams, consider using Tracker API instead of manually calling functions like FSDK_DetectFace, FSDK_DetectFacialFeatures or FSDK_GetFaceTemplate for each frame (“manual handling”). The difference between Tracker API and manual handling is summarized in the table below.

	Tracker API	Manual handling
Development effort	A developer uses a few Tracker API functions to handle an incoming video frame and set and retrieve subjects’ names. The API automatically learns the appearance of every detected subject.	A developer must implement different modes in the program: a mode to enroll subjects and a mode to recognize faces. In the enrollment mode, the program must store a certain (usually found experimentally) number of face templates in the database, while the subject is posing in front of the camera. In recognition mode, a template for each detected face is created and is matched against the database.
Performance	The API constantly learns how subjects appear. Thus, its recognition rate is usually higher than of a system that merely stores several templates of a subject.	If environmental conditions (such as lighting) change after the enrollment, the system may not recognize the subject, and a new enrollment will be required.
Recognition rates	Tracker API provides the estimate of recognition rate and false acceptance rate specifically for video streams.	FSDK_MatchFaces provides FAR/FRR values for matching a pair of images. Typically, it is not easy to estimate how the storage of several templates per person affects recognition rate, how often false acceptances occur as the video progresses, and if false acceptance rate increases as more subjects are enrolled.
Enrollment	The subject is generally not required to pose. When the operator assigns a name to the subject, it is likely that Tracker API has already captured enough views of a subject to recognize it in later frames.	The subject is required to pose in front of the camera, for the system to capture the face in different views and environmental conditions, and with different facial expressions.
Recognition without enrollment	Every subject is recognized, regardless of whether it was already tagged with a name. The API assigns a unique ID to track the subject across the video. This allows for surveillance applications, when subjects cannot be required to participate willingly (that is, to pose) to be enrolled for recognition.	Only enrolled subjects can be recognized. The requirement to participate actively in recognition makes surveillance applications difficult.
Tracking of multiple faces	The API tracks, recognizes, and allows assigning names to multiple faces simultaneously present in the video frame.	Usually only a single subject can pose in front of the camera when enrolling. If other subjects are visible, the system may mistakenly store their templates into the subject’s database record. A separate tracking mechanism is required to decide whether the detected face belongs to the enrolled subject or not.
Facial feature detection	Tracker API allows tracking of facial feature coordinates of each subject in the video frame. Jitter is eliminated by smoothing.	The coordinates detected by FSDK_DetectFacialFeatures may jitter because of noise present in the video. If multiple faces are present, a tracking mechanism is required to implement smoothing.
Gender and age recognition	The API allows for identifying gender and age for each subject tracked in the video. The analysis of the video usually provides higher recognition rates than still image gender and age recognition.	When each video frame is treated as a still image, gender and age recognition rates are usually lower.
Facial expression recognition	The API allows for identifying if a smile is present and if the eyes are open or closed for each subject tracked in the video. The analysis of the video usually provides higher recognition rates than still image expression recognition.	When each video frame is treated as a still image, expression recognition rates are usually lower.

Understanding Identifiers

The API analyzes the video stream sequentially, frame by frame. For each frame, the FSDK_FeedFrame function returns the list of identifiers (integer numbers) of faces recognized in this frame. The purpose of an identifier is to assign a unique number to each subject in the video. If a face is similar to one recognized previously, it receives the same identifier. Otherwise, a new identifier (in ascending numeric order) is assigned. Thus, subjects recognized as different should get different identifiers.

It is important to note that the identifier value is meaningful only within a particular video stream. Identifiers of the same subject are not expected to be the same across different video streams.

A subject can have several identifiers

The same subject can get different identifiers in different frames (for example, ID1 in the first frame and ID2 in the second, ID2 > ID1), if the system was not able to match its face to ones previously seen (which might happen if the appearance of the subject on the second frame was notably or unexpectedly different).

Merger of identifiers

However, as the video progresses, the system learns more about the appearance of each person; at some point it may deduce that ID1 and ID2 actually represent the same person. In such a case (and if it is possible) it merges both identifiers into ID1, further returning ID1 for every novel recognized occurrence of this subject. The system retains the information of all merger events, so it is possible to receive the resulting value of an early assigned identifier (for example, receive the ID1 value when providing the ID2 value) by calling the FSDK_GetIDReassignment function. Note that if an identifier was tagged with a name, it can be merged only with other identifiers that are untagged; in such a case the tagged name is retained.

When calling Tracker API functions with identifiers received on earlier frames, it is always recommended to convert the identifier values with the FSDK_GetIDReassignment function first, and only then pass them to Tracker API. The reason is that they may have been merged on the subsequent frames, so the corresponding subjects are being represented with other identifier values.

When identifiers are not merged

The API supports tagging an identifier with a name, provided by the user. If identifiers are tagged with different names, they will not be merged.

The appearances of each subject are stored in the memory (see the Memory section). If a subject has been tagged with a name, and the memory for this subject is full, it will not be merged with any other identifier (because such a merger requires additional memory for the subject).

Similar identifiers

The identifier returned by the FSDK_FeedFrame function can be similar enough to other identifiers for the API to decide they represent the same person. Still, some reason (such as the one described above) may prevent them from merging. In such case, similar identifiers of an ID can be retrieved using the FSDK_GetSimilarIDList function.

You should always retrieve the list of similar identifiers when deciding if the recognized face belongs to a certain subject or not. Let us assume that you have a particular subject of interest and should respond when it is recognized. You may have stored an identifier of that subject, or assigned a name to it with FSDK_SetName, and wait for such identifier (or name) to appear. (Keep in mind that you need to adjust the stored identifier with FSDK_GetIDReassignment after calling FSDK_FeedFrame.) When the subject appears, however, there is no guarantee that the stored identifier will be returned by the FSDK_FeedFrame function. Instead, it may appear in the list of similar identifiers. Therefore, you should compare your identifier against the list of similar identifiers for each ID returned by FSDK_FeedFrame. Accordingly, you need to retrieve the names of each similar identifier, for each ID returned by FSDK_FeedFrame, to find if any of these names belong to the subject of interest. If you are not considering such lists of similar identifiers, your recognition rate will be lower (that is, you may miss the appearance of the subject of interest). Of course, your false acceptance rate will be lower as well. But the drop in recognition rate will be higher compared to when you set a higher recognition threshold (see the Recognition Performance section), and handle similar identifiers.

The function FSDK_GetAllNames implements the above functionality – it returns the name of an identifier, concatenated with the names (if any) of similar identifiers, separated by a semicolon.

Tracker Memory

The API allows limiting the memory used by a tracker. The memory size is measured in the total number of facial appearances stored (about 11 Kbytes per appearance when the KeepFaceImages parameter is set to true, and about 1.5 Kbytes when set to false). By default, the limit is 2150 appearances (about 24 Mbytes or 3Mbytes depending on the value of the KeepFaceImages parameter). You can change the limit by setting the MemoryLimit parameter (see the Tracker Parameters section) to your desired value.

Memory available for each subject

For each subject tagged with a name, the amount of memory available is

max(1, memoryLimit/(subjectCount+1))

where subjectCount is the total number of subjects tagged with a name. The remaining memory is dedicated to untagged identifiers.

If, when setting a name with FSDK_SetName, there is not enough room for a new subject, the API will return the FSDKE_INSUFFICIENT_TRACKER_MEMORY_LIMIT error.

Imposing memory limits

If a memory limit for an identifier, tagged with a name, is approached, then no new appearances of that subject will be stored. That is, the system stops learning novel appearances of the subject. Furthermore, the identifier will not be merged with any other identifiers.

If a memory limit is approached for untagged identifiers, the earliest untagged facial appearance becomes purged when calling FSDK_FeedFrame. Note that only a particular appearance of some untagged identifier becomes purged, not the identifier’s entire record of appearances; identifiers that have only one occurrence are purged completely. To prevent purging, you may use the FSDK_LockID function.

Note that if an identifier is tagged, and does not occupy more memory than available per subject, its facial appearances are not purged.

How to set the memory limit

The higher the limit, the more identifiers you can tag, and the more facial appearances can be stored for each identifier (thus improving the recognition rate). However, the Threshold parameter should also be higher (but setting too high a Threshold has its downsides – see the Recognition Performance section), for the false acceptance rate to stay at an acceptable level.

When increasing MemoryLimit, the frame rate may decrease. Therefore, it is practical to choose a memory limit that will allow for a sufficient frame rate, will not require too high a threshold, and will consume only a certain amount of memory, while at the same time allowing for the storage of the desired number of subjects.

See the Recognition Performance section to find which Threshold values should be chosen with different memory limits to achieve the desired recognition rates.

Tracker Parameters

Each HTracker instance allows setting a number of parameters with the FSDK_SetTrackerParameter or FSDK_SetTrackerMultipleParameters function.

Face tracking parameters

Note that the Tracker API does not use the parameters of face detection, set with FSDK_SetFaceDetectionParameters or FSDK_SetFaceDetectionThreshold. Instead, you should use the Tracker API parameters below.

FaceDetectionModel, TrimOutOfScreenFaces, TrimFacesWithUncertainFacialFeatures – the parameters analogous to ones described in the FaceSDK Parameters section. Their default values are (default, true, true).

HandleArbitraryRotations, DetermineFaceRotationAngle, InternalResizeWidth – the parameters analogous to ones in FSDK_SetFaceDetectionParameters. Their default values are (false, false, 256).

FaceDetectionThreshold – a parameter analogous to one in FSDK_SetFaceDetectionThreshold. The default value is 5.

FaceTrackingDistance – specifies the maximum distance between faces of one person on consecutive frames, to consider an uninterrupted tracking sequence. The parameter is measured in width of the detected face. The default value is 0.5. You may decrease it when the frame rate is high to lower the probability of false acceptances, or increase it when the frame rate is low and the recognition rate is low due to interrupted tracking.

Face recognition parameters

RecognizeFaces – whether to recognize subject’s identity. If set to true, the system attempts to assign each subject a unique id, while giving equal identifiers to the same subject across the video. If set to false, the system will return a unique ID value for every uninterrupted sequence of a detected face (that is, when a certain face is detected on every frame of the sequence), regardless of the identity of this face. The default value is true.

DetectGender – whether to recognize the gender of a subject. Gender recognition requires the detection of facial features, so when set to true, facial features are detected regardless of the DetectFacialFeatures parameter. The default value is false.

DetectAge – whether to recognize the age of a subject. Age recognition requires the detection of facial features, so when set to true, facial features are detected regardless of the DetectFacialFeatures parameter. The default value is false.

DetectExpression – whether to recognize facial expression of a subject. Expression recognition requires the detection of facial features, so when set to true, facial features are detected regardless of the DetectFacialFeatures parameter. The default value is false.

DetectLiveness – whether to perform passive liveness detection. See the Passive Liveness section for more details. The default value is false.

Learning – whether to learn subjects’ appearances. If set to true, the API will learn the appearance of each subject, unless its memory is full, and add new subjects to the memory. If set to false, the system will return only the identifiers already present in the memory; no addition of novel subjects, novel facial appearances, or merger of identifiers will occur. If a subject does not match any appearance stored in the memory, FSDK_FeedFrame will return the -1 identifier for that subject. Set this flag to false if you have a reason not to alter the memory. The default value is true.

MemoryLimit – the amount of memory available for the storage of facial appearances. See the Tracker Memory section below. The default value is 2150.

Threshold – the threshold used when deciding if two facial appearances belong to the same subject. Each threshold value alters both the false acceptance rate and recognition rate. See the Recognition Performance below. The default value is 0.992.

KeepFaceImages – whether to store the original facial images in the Tracker memory. See the Storing original facial images, section for details. The default value is true.

Facial feature tracking parameters

DetectEyes – whether to detect eyes. Eyes will be detected regardless of the value of this parameter when RecognizeFaces is set to 1. When eyes are detected, their coordinates can be retrieved with FSDK_GetTrackerEyes. The default value is false.

DetectFacialFeatures – whether to detect facial features. Facial features are detected when RecognizeFaces is set to 1, regardless of the value of this parameter. They are also detected if DetectGender, DetectAge or DetectExpression are set to 1. The default value is false.

DetectAngles – wheter to estimate out-of-plane face rotation angles by using the detected facial features. Pan and Tilt are returned as the Angles facial attribute. The default value is false.

FacialFeatureJitterSuppression – whether to suppress the jitter of facial features by employing more processor resources. If 0, such jitter suppression is not employed. Set to a higher value for better suppression. A non-zero setting takes effect only when DetectFacialFeatures=true, even if facial features are actually detected due to the setting of the RecognitionProcision, DetectGender, DetectAge or DetectExpression parameters. The default value depends on NUM_THREADS, the number of threads supported by the CPU, which can be obtained using the FSDK_GetNumThreads function. The default value is NUM_THREADS – 1 when NUM_THREADS <= 4; 3 when NUM_THREADS <= 8; NUM_THREADS/2 - 1 when NUM_THREADS <= 40; and 19 otherwise. Note that the way the default value is selected may change the behavior of Tracker API from system to system. That is, systems supporting more threads will display smoother facial feature coordinates by default. This may also change the behavior of face recognition and attribute detection (although it will not change their average accuracy). If you need Tracker API to return the same output for the same input data regardless of the number of threads, set this parameter to a fixed value in your application.

SmoothFacialFeatures – whether to smooth facial features from frame to frame to prevent jitter. If set to false, the coordinates of facial features are detected independently of the previous frame, and may jitter because of the noise present in the video. If the parameter is set to true, the API will smooth the coordinates of facial features. The default value is true.

FacialFeatureSmoothingSpatial – a coefficient employed in facial feature smoothing. Controls spatial smoothing of facial features. The default value is 0.5.

FacialFeatureSmoothingTemporal – a coefficient employed in facial feature smoothing. Affects temporal smoothing of facial features (that is, how the smoothed coordinates relate to their coordinates on the previous frame). The default value is 250.

Tuning for Optimal Performance

The higher the frame rate of FSDK_FeedFrame (i.e., fast processing of frames) usually positively affect the recognition rate for live video, because more facial appearances of a person can be captured per unit of time.

Experiment with face detection parameters, especially withInternalResizeWidth: higher values allow for faces to be detected at greater distance, but require additional time (and lower the frame rate). If you find a high number of false detections (i.e. when faces are detected where they are not present), try increasing the FaceDetectionThreshold parameter.

Setting DetectGender, DetectAge or DetectExpression to true will lower the frame rate. If you need only to detect gender, age or facial expressions, you may consider setting the RecognizeFaces parameter to false, in order to increase the frame rate.

Using the API

The API allows for creating several trackers within the program, each having a separate memory for the recognized subjects and their names.

The tracker is represented with the HTracker data type.

C++ Declaration:

typedef unsigned int HTracker;

Locking identifiers

There are cases when you need to work with (or tag) an identifier across several frames. For example, you may have the user interface running in a different thread than FSDK_FeedFrame. Then, there is a chance that when a user selects an untagged identifier and starts to enter a name for it, the identifier may become purged by FSDK_FeedFrame running in parallel (see the Tracker Memory section). To prevent this, you need to use the FSDK_LockID functions as soon as the user selected an identifier. The function will prevent the untagged identifier from being purged completely.

Multiple camera support

Tracker API is designed to support multiple cameras, though in the current release only a single camera is supported. You should pass 0 as the CameraIdx parameter to every function that accepts it. You should not alternate frames from multiple cameras while sending them to FSDK_FeedFrame, since it will disrupt the tracking process, and yield a lower recognition rate and a higher false acceptance rate. It is also not recommended to switch from one camera to another while sending the frames using FSDK_FeedFrame. It is acceptable, however, to switch cameras before the memory of the tracker is loaded with FSDK_LoadTrackerMemoryFromFile or FSDK_LoadTrackerMemoryFromBuffer.

Storing original facial images

As the internal format of facial appearances may change in future versions of FaceSDK, Tracker API has the KeepFaceImages parameter, which controls whether the original facial images are stored in the Tracker memory. If the format changes, you will be able to convert your Tracker memory to the new format automatically (if you’ve stored the original facial images). In such a case, you won’t need to reenroll your subjects. It is recommended that you keep this parameter set to true, its default setting.

When the KeepFaceImages parameter is set to true, Tracker API stores an original facial image along with every facial appearance in the Tracker memory. The size of a facial appearance is about 1.5 Kbytes when KeepFaceImages is set to false, and about 11 Kbytes when KeepFaceImages is set to true. Note that if you’ve had this parameter set to false and accumulated some facial appearances, their original facial images will be lost, even if you set KeepFaceImages to true after that.

If you don’t want the original facial images to be stored in the Tracker memory, set this parameter to false.

Usage Scenario

The following scenario is employed when using Tracker API.

1. Create a tracker (FSDK_CreateTracker) or load it from a file (FSDK_LoadTrackerMemoryFromFile) or from a memory buffer (FSDK_LoadTrackerMemoryFromBuffer).

2. Set tracker parameters (FSDK_SetTrackerParameter, FSDK_SetTrackerMultipleParameters), such as face detection parameters, recognition precision, or the option to recognize gender/age/facial expression or to detect facial features.

3. Open a video camera (FSDK_OpenVideoCamera, FSDK_OpenIPVideoCamera), or prepare to receive video frames from another source.

4. In a loop:

1) Receive a frame from a camera (FSDK_GrabFrame) or another source.

2) Send the image to the FSDK_FeedFrame function.

3) Display the image on a screen.

4) For each ID returned by FSDK_FeedFrame:

i. Retrieve its facial coordinates (FSDK_GetTrackerFacePosition), eye center coordinates (FSDK_GetTrackerEyes), facial feature coordinates (FSDK_GetTrackerFacialFeatures), gender, age or facial expression (FSDK_GetTrackerFacialAttribute).

ii. Retrieve the list of possible names (FSDK_GetAllNames).

iii. If, relying on coordinates, you found that that user has clicked on a face, call FSDK_LockID on that identifier, display an input box and ask the user for a name of the subject. You may continue to run FSDK_FeedFrame in parallel.

iv. If the user entered a name, set it using the FSDK_SetName function. If the user chose to erase the subject, call FSDK_SetName with an empty name. In any case, call FSDK_UnlockID to unlock the identifier.

v. If manually handling identifiers (for example, storing the identifier of each subject to look up them later, or storing images of each subject), call FSDK_GetSimilarIDCount and FSDK_GetSimilarIDList to retrieve identifiers, similar to ID, and store (or compare against) them as well. In addition, call FSDK_GetIDReassignment for every previously stored identifier before comparing against them.

5) If necessary, save tracker memory to a file or a buffer (FSDK_SaveTrackerMemoryToFile, FSDK_SaveTrackerMemoryToBuffer).

5. Free the tracker handle using FSDK_FreeTracker.

6. Close the video camera (FSDK_CloseVideoCamera).

User Interaction with the System

In a typical scenario, a user observes the images from a camera, with faces outlined in rectangles and names displayed under the rectangles. There is an option to tag a subject with a name by clicking its face and entering the name, or to remove the subject from the memory. The software may notify the user when some previously defined subjects appear. The software may additionally store each image of a subject, and allow browsing such subject’s images. The software may store images of untagged subjects as well (and store their ID along with the image), but keep in mind that if the memory limit is reached, earlier appearances of untagged subjects will be purged, and should these subjects appear again, they may be given with new ID numbers (unrelated to their old identifiers; see the Tracker Memory section).

The user normally should have control over the MemoryLimit and Theshold parameters to alter the recognition quality and the number of subjects that can be stored in the system.

Enrollment

To enroll a subject, the user is usually only required to click a subject’s face and enter the name. If the subject has been already present in front of the camera for a certain time (for example, while approaching the user’s desk), it is likely that the API has stored enough facial appearances of the subject to recognize it again. If this is not the case, the subject maybe asked to tilt or rotate its head, to walk closer to or further away from the camera, and the lighting can be altered. If the frame rate is especially low, or if environmental conditions change unexpectedly, the API may not recognize the subject in some appearances. In such cases, the user may tag a subject with the same name on several occasions, until enough facial appearances are stored, and the subject is consistently recognized.

If you need to ensure that you track a live subject, consider detecting whether the facial expression changes with the with the FSDK_GetTrackerFacialAttribute function.

Dealing with false acceptances

The API is designed to return several names with FSDK_GetAllNames for a certain ID. In most cases, the system will return only a single name. If the system returns several names, it means that a false acceptance has occurred. That is, two (or more) subjects became confused.

Although the false acceptance rate is usually low, there is no way to eliminate it completely; instead, the user balances the false acceptance rate against the recognition rate. The software should account for the scenario when a false acceptance has been occurred.

In an access control setting, you may decide to grant access to the subject if any of the names recognized has the appropriate permissions. Alternatively, the software may signal about a false acceptance, and the user may decide to set the Threshold parameter to a higher value – to lower the probability of next false acceptance. In that case it is necessary, first, to erase the persons that were confused (by calling FSDK_SetName with an empty name to remove the name, and FSDK_PurgeID to remove all facial appearances of this ID)), and then, when the threshold is set to a higher value, to set their names again.

Keep in mind that not every false acceptance will return several names of a person. It is possible that just a single incorrect name is returned, and the false acceptance may go unnoticed. However, with the appropriate setting of the Threshold parameter, such scenarios are rare.

Note that when there are one or more similar identifiers returned with FSDK_GetSimilarIDList, and these identifiers do not have name tags, this does not always mean a false acceptance. As described in the Understanding Identifiers section, when the memory for an identifier is full, it will not become merged with other identifiers (even if they represent the same subject), so these identifiers will be returned in the list of similar identifiers.

Saving and Loading Tracker Memory

To save the memory of a tracker to file, use the FSDK_SaveTrackerMemoryToFile function. Alternatively, you may save it to a memory buffer (for example, to for later importing into a database). You need to call FSDK_GetTrackerMemoryBufferSize to determine the size of the buffer, and then call FSDK_SaveTrackerMemoryToBuffer.

Conversely, to load the memory of a tracker from a file or a buffer, use the FSDK_LoadTrackerMemoryFromFile or FSDK_LoadTrackerMemoryFromBuffer functions. Note that you need to set the tracker parameters again after loading, because a new tracker handle has been created, with parameters set to default values.

Note that this operation saves only the memory contents of a tracker: stored facial appearances, identifiers, and names. The parameters of a tracker are not saved. Moreover, the internal state of face tracking is not saved as well. It means that if, during the main loop (where you call FSDK_FeedFrame), you save the tracker to a file, and then immediately reload it, such an operation will disrupt face tracking. Because of this, the later recognition results you receive will be different (compared to when such an operation was not done), and the parameters will be reset to defaults. Also, you will not be able to receive face position, eye coordinates, facial feature coordinates, or get the list of similar identifiers immediately after loading. However, after the next FSDK_FeedFrame call, face tracking resumes, and the aforementioned functions operate normally.

Recognition Performance

The performance of face recognition (i.e. how often a subject is recognized, and how often different subjects are confused) is controlled with the Threshold and MemoryLimit parameters. The higher the Threshold parameter (and the lower the MemoryLimit parameter), the less often a subject will be recognized, and the less often confusions will occur.

Performance measures

Tracker API employs two performance measures: false acceptance rate (FAR) and recognition rate (R). FAR measures the rate of confusing different subjects (that is, assigning different subjects with equal identifier values) during a certain number of storage events, once the memory becomes full. R measures the rate of recognizing a person after tagging, once all available memory is full.

Understanding storage events

When calculating FAR, one could just count how often false acceptances occur during a certain time interval (for example, an hour). However, such a measure will vary greatly across different kinds of video footage.

For example, in an office setting, when subjects are sitting at their desks, and change their positions or facial expressions rather slowly, almost every frame will be very similar to the previous one. Therefore, the API will store novel facial appearances at a slow pace. If there were no false acceptances on a previous frame, they are very unlikely to occur on the next; therefore we expect false acceptances to occur rather rarely.

On the other hand, in an active setting (when many novel subjects appear in front of the camera, move around, and disappear from view), we expect the system to store novel facial appearances quite often, because many subjects appears at previously unseen views. Therefore, we expect false acceptances to occur more often, because of the faster pace of the video.

To employ a rate that is meaningful in both settings, we instead measure time not in seconds, but in storage events. For example, in the office setting, at 12 frames per second, we may get only 400 storage events during an hour, and in the active setting we may get 3600 storage events during an hour. We measure FAR at an interval of 2000 storage events, which could be roughly equal to 5 hours of a hypothetical less active setting, or 32 minutes of an active setting. It is important to note that as facial appearances of a subject accumulate, the rate of storage events will slow down, since there will be fewer novel facial appearances.

How to measure your rate of storage events

To measure the rate of storage events in your setting, call FSDK_GetTrackerParameter with the MemorySize parameter during the main loop. Each time a storage event occurs, the MemorySize parameter increases. As your video progresses, you may calculate how much time will be needed to reach 2000 storage events. Note that when the memory is full, storage events themselves still occur, but nothing is stored; this does not mean that FAR becomes zero. You should estimate the rate of storage events before the memory is full.

Understanding FAR

FAR is the rate of assigning different subjects with equal identifier values. The rate tells how often a certain subject (say, John) will be falsely accepted as any other subject. For example, if FAR is 0.001, John might expect a 0.001 probability of being falsely accepted as some other subject. However, if we have 10 subjects in the system, such a rate applies to every one of them. Therefore, it is practical to know the rate of falsely accepting at least two subjects among any of them. Such a rate can be calculated as

1 – (1 – FAR)^N*(N-1)/2

where N is the number of subjects. For example, at FAR=0.001, N=10, we have a 4.4% rate that at least one false acceptance will occur during the 2000 storage events considered. To have a 1% rate with 10 subjects, FAR should not exceed 0.0003.

Understanding R

R is the rate of recognizing a subject after it was tagged a single time, and all memory available for a subject becomes full. A subject is successfully recognized, if its name is present among the names returned by FSDK_GetAllNames. R is measured from 0 to 1, which translates to recognition in 0% and 100%, respectively, of frames received by FSDK_FeedFrame.

R depends mainly on the amount of memory available for each subject. For example, if there are 30 subjects in your system, and you allow 20 units of memory for each subject, your memory limit should be (30+1)*20=620.

Choosing Threshold value

To choose the Threshold value, refer to the tables below. You should consider the maximum number of subjects to be tagged within your system, and the maximum memory per subject.

Generally, the higher the MemoryLimit is set, the higher the FAR will be (once all available memory has been used).

Note that higher Threshold values together with a higher memory amount allow higher recognition rate only when enough facial appearances of an identifier have been accumulated. If there are sudden changes in facial appearance (due to low frame rate or environmental factors, for example), it may require more time to capture enough facial appearances with a higher Threshold value.

The tables below show the expected false acceptance rate and recognition rate.

False Acceptance Rate at Threshold and MemoryLimit

Threshold	MemoryLimit
	350	700	1750	3500	5250	7500
0.992000	0.000081	0.000130	0.000231	0.000266	0.000277	0.000277
0.993141	0.000066	0.000107	0.000183	0.000209	0.000216	0.000216
0.994283	0.000062	0.000089	0.000144	0.000166	0.000170	0.000170
0.995424	0.000052	0.000068	0.000101	0.000114	0.000118	0.000118
0.996566	0.000042	0.000050	0.000072	0.000077	0.000081	0.000081
0.997707	0.000036	0.000040	0.000054	0.000055	0.000056	0.000056
0.998849	0.000030	0.000034	0.000045	0.000039	0.000039	0.000039
0.999990	0.000002	0.000007	0.000009	0.000012	0.000014	0.000023

Recognition Rate at Threshold and Memory per subject

Threshold	Memory per subject
	5	10	15	21
0.992000	0.995	0.999	0.999	0.999
0.993141	0.994	0.999	0.999	0.999
0.994283	0.993	0.998	0.999	0.999
0.995424	0.991	0.998	0.998	0.998
0.996566	0.986	0.997	0.997	0.997
0.997707	0.978	0.995	0.996	0.996
0.998849	0.956	0.986	0.988	0.988
0.999990	0.073	0.087	0.107	0.138

For example, let us assume that you have 30 subjects in an office setting, your frame rate is 12 per second, and you decide to allow 21 units of memory per subject. Therefore, your memory limit is (30+1)*21 = 651 (see the formula in the “Memory available for each subject” section). You decide to have a FAR of 0.000050 and calculate that with 30 subjects, there will be 2.2% rate that a subject will be given with an ID of any other subject (see the formula in the Understanding FAR section) during 2000 storage events (approximately 5 hours in an office setting). To have a FAR of 0.000050 with MemoryLimit=700 (the value closest to 651 in the table), you choose Threshold=0.996566. You note that at such a threshold and 21 units of memory per subject, you have a 0.997 recognition rate (meaning subjects will be recognized in 99.7% of frames in the video).

Note: it is not recommended to use Threshold higher than 0.999, since it will make Tracker API recognize faces less often.

Gender, Age and Facial Expression Recognition

The API allows for identifying gender and age of a face and its expressions by using the FSDK_GetTrackerFacialAttribute function.

To detect gender, you need to set the DetectGender tracking parameter to true. The function returns confidence levels for each gender (male and female) in the output string. You can parse this string using the FSDK_GetValueConfidence function.

To detect age, you need to set the DetectAge tracking parameter to true. The function then returns the age of the face in the output string.

To detect expression, you need to set the DetectExpression tracking parameter to true. The function returns confidence levels for each expression (if a smile is present and if the eyes are open or closed) in the output string. You can parse this string using the FSDK_GetValueConfidence function.

The confidence level for each attribute, returned by the FSDK_GetTrackerFacialAttribute function, varies from 0 to 1 (except the “Age” attribute — for which the age itself — not a confidence level, is returned). When recognizing gender, you may assume that the recognized gender will be the one with the higher confidence level.

If your system should respond to the particular gender of a novel subject (for example, when advertising separate products for male and female visitors), consider waiting for about a second after the subject has first appear, for the gender to be recognized with higher accuracy. You may also consider responding when the confidence level is not just merely than 0.5, but exceeds a certain threshold (for example, 0.7 or 0.9, which translate to 70% or 90% accuracy).

If your system should respond to the particular expression of a subject (for example, taking a picture only when a person smiles and the eyes are open), consider waiting for about 0.5 seconds after the subject has appeared. To find out if the expression is present, it is usually optimal to compare the confidence in the attribute value with the 0.5 threshold (i.e., if the confidence in the “Smile” value is greater than 0.5, the person smiles, and if the confidence in the “EyesOpen” value is greater than 0.5, the eyes are open). You may use a higher threshold for greater certainty, but in this case some expressions may not be detected.

Note that gender, age and expression recognition requires the detection of facial features, so facial features will be detected regardless of the DetectFacialFeatures parameter value. This will decrease the frame rate. You might consider setting the RecognizeFaces parameter to false if you only need to detect gender, age or expressions and do not need recognition of the subjects’ identities, which will increase the frame rate.

Face, Eye and Facial Feature Tracking

Tracker API supports the tracking of face, eye centers, and facial features in addition to the recognition of a subject’s identity. You need to use the FSDK_GetTrackerFacePosition, FSDK_GetTrackerEyes and FSDK_GetTrackerFacialFeatures to retrieve the corresponding coordinates. You also need to set the parameter DetectEyes or DetectFacialFeatures to true when tracking eyes or facial features, respectively. Tracker API perform smoothing of facial features (see the SmoothFacialFeatures parameter).

When you only need to track faces, and do not need to recognize subjects’ identities, you can disable face recognition to improve performance. To accomplish that, you need to set the RecognizeFaces parameter to false.

Counting the number of people

You should not estimate the amount of people the system observed based on the values of the identifiers, since some of they may have been merged with others. Instead, you may retain all the ID values returned by Tracker API, and at the point when the number of people should be estimated, you should replace each ID with the value returned by the FSDK_GetIDReassignment function. Then, you can count the amount of different identifiers in the list. Note that if memory limit is approached, some untagged identifiers may be purged, and the amount of people may be overestimated. See the User Interaction with the System section for details.

If each subject captured by the camera appears only once, you may consider not determining the subject’s identity (set RecognizeFaces to false). Then, the value of the ID returned by the API will be equal to the total number of continuous facial sequences, or approximately the number of people appeared in front of the camera.

Thread Safety

All tracker functions are thread safe. Note that you should avoid calling FSDK_FeedFrame simultaneously on the same tracker from several threads, since it will disrupt the FSDK_GetTrackerEyes, FSDK_GetTrackerFacialFeatures, FSDK_GetTrackerFacePosition, FSDK_GetTrackerFacialAttribute, FSDK_GetSimilarIDCount, FSDK_GetSimilarIDList and FSDK_GetAllNames functions. The reason is that the ID received from FSDK_FeedFrame must be passed to these functions before the next FSDK_FeedFrame is executed with the following frame; otherwise these functions may not perform correctly.

Next chapter: Tracker API Functions

Contents