Rejecting AttGAN, results from Voice anonymization and hiding faces

In this post, I have listed down the status of AttGAN, results from SoX module and results of hiding faces.

Face Anonymization

The AttGAN did not work at all on the real images. The outputs were very unnatural. Hence, AttGAN was discarded.

Prof Mark suggested another face anoymization method. The idea is to conceal the face using a oval/box. This is simple to implement using OpenCV’s haar cascade detectors. It was found to work well on demo videos. However, there were one or two frames where the face was not detected due to occlusion or tilted pose. In such cases, we utilize the face locations from previous frames as the faces do not move rapidly.

The face detector needs two parameters: scaleFactor and minNeighbors. scaleFactor was found to be sensitive, hence we keep it constant and alter the minNeighbors.

We begin with minNeighbors = 1. In this case, several candidates will be detected which are not actually faces. Hence, we increase the minNeighbors by 2, till the number of candidates is 0 or 1. This method was found to work well and is proper automation of parameter selection.

Voice Anonymization

We use the sox module for python. moviepy module was used for handling video and audio. There are a large variety of effects offered by sox, and each of them has some parameters.

As a general observation, the pitch was found to have a dominant role in anonymizing. Pitch itself was found to be sufficient.

I think, any efforts beyond this for audio will be good but not necessary. Hence, we can look up more advanced methods when balancing the time constraints.

Links to demo files

(Only accessible to restricted accounts) Hinding face with box: https://drive.google.com/drive/folders/12ENDkre79iZIsDeuI346leWY6KTpAupV?usp=sharing

Audio anonymization: https://drive.google.com/drive/folders/19YRPBMN9QQnaDQ19LmqLdifR5NrTphSJ?usp=sharing

Original Videos: https://drive.google.com/drive/folders/1pHqdtWfV-H8jTUE7zu8scZ_Pz-DMvxE-?usp=sharing

Next Steps

I searched a good amount of repositories for face attribute editing.

Here is a list of papers on face attribute editing: https://github.com/clpeng/Awesome-Face-Forgery-Generation-and-Detection#attribute-manipulation

The following repositories were found to have something useful with our task:

https://github.com/dvlab-research/Facelet_Bank
https://github.com/zllrunning/face-makeup.PyTorch

I will try making applying them on demo videos.

I checked some details of Vid2vid. Face->edge->face might be possible. I also ran their edge->face converter. However, their code is big, hence it will take some time to understand the details and how to adapt it for custom data. But, with some efforts, face-edge-face does seem to be a feasible option.

Regarding body-pose-body: I think it will be really difficult because they made their own dataset by downloading videos from youtube. For the same reason, they have not provided their pretrained weights or dataset for body-pose-body. However, since we need to simply change the clothes, I am investigating a specific problem statement that just replaces clothes. For example: https://github.com/kishorkuttan/Deep-Virtual-Try-On . I am still searching for a feasible code for usage.

Candidate Virtual-try-on repositories:

https://github.com/kishorkuttan/Deep-Virtual-Try-On
https://github.com/levindabhi/SieveNet

There were some more discussions over e-mail for how audio anonymization has to be presented and its further course. I will detail them once we begin working on them.

Far future thoughts

There seems an interesting work: https://github.com/run-youngjoo/SC-FEGAN , where you can manually edit photos. But the editing is a tedious task(like editing on a paint application), hence we may not use it at all unless needed specifically.