September 25, 2017, Baidu in GitHub open source mobile-depth learning framework mobile-deep-learning (MDL) all the code and the script, hope that the project in the community can be driven to better development.
Written in front
Deep learning techniques have had an impact on the direction of the Internet, and there is more and more discussion about depth learning and neural networks every day in science and technology news. Depth learning technology in the past two years of rapid development, a variety of Internet products are competing to apply the depth of learning technology, product introduction to the depth of learning will also further affect people's lives. With the widespread use of mobile devices, in-depth application of mobile Internet products and neural network technology has become an inevitable trend.
Image technology, which is closely linked to deep learning, is also widely used in the industry. The combination of traditional computer vision and deep learning has led to the rapid development of image technology.
Application of Mobile End Depth Learning Technology
Baidu application case
In the mobile application of depth learning technology is typical of CNN (Convolutional Neural Network) technology, that is often mentioned by the convolution of the neural network. mobile-deep-learning (MDL) is a mobile-side framework based on convolution neural networks.
MDL in the mobile side of what are the main applications? More common, such as to distinguish a picture of the object is what, that is, classification; or identify a picture where the object, how much, that is, the main body recognition.
The following App called pick phase, you can find the Android platform application treasure. It can automatically for the user to classify the photos, for users with a large number of photos, this feature is very attractive.
In addition, in the mobile phone Baidu search box to open the right side of the image search, open the image search interface after the effect as shown below. When the user in the general vertical category to open the automatic shooting switch (marked below), the hand stops it will automatically find the object frame selection, and no need to take pictures directly to initiate image search. The whole process can give users a smooth experience, without the user manually take pictures. The frame in the application of the application is the typical depth of learning the main body recognition technology, using the mobile-deep-learning (MDL) framework. MDL is currently running in the mobile phone Baidu more than a number of versions, after several iterations after a substantial increase in reliability.
Other cases in the industry
The Internet industry in the application of neural networks on the mobile side of the case has been more and more.
The current genre there are two, one is completely running on the client neural network, the advantages of this approach is obvious, that is, do not need to go through the network, if you can guarantee the speed, the user experience will be very smooth. If you can ensure efficient operation of the mobile network neural network, users can not feel the loading process. The use of a completely out of the Internet network in the mobile end of the neural network has been exemplified by the search, such as the pick-up and mobile Baidu in the image search.
The second is another, computing neural network process depends on the Internet network, the client is only responsible for UI display. Before the client neural network landed, most App used this operation in the server, showing the way in the client. The advantage of this approach is that the implementation is relatively easy and the development costs are lower.
In order to better understand the realization of the above two kinds of neural networks, the following shows two examples of plant flora identification, respectively, the use of the flower and color of the two App. These two App are using a typical classification method, can be found in the iOS platform App Store. The following figure is a lotus picture, this picture using the flower and shape two App categories can get a better classification results. You can try to install these two App and according to the use of results to determine which of them were using the above method.
Knowledge of flowers
Over the past year there have been many flowers to identify App. Microsoft "Flower" is a Microsoft Asia Research Institute launched a flower for the identification of the App, the user can choose flowers after shooting, App will give the flowers of the relevant information. Accurate flower classification is a highlight of its external publicity.
This "color" App, only need to match the plants (flowers, grass, tree) take pictures, you can quickly give the name of the plant, there are many interesting plant knowledge, such as the plant what alias, plant Flower language, the relevant ancient poetry, plant culture, interesting stories and conservation methods, read a lot of harvest.
The difficulty of applying the depth learning of the mobile application
Has been due to technical barriers and hardware conditions, in the application of the depth of the application of the depth of learning on the success of small cases. Traditional mobile side UI engineers in the preparation of neural network code, you can access to the depth of the mobile learning materials are few. On the other hand, the current Internet competition is quite intense, first into the Xianyang king, you can take the lead in the depth of learning technology in the mobile side of the application, you can grasp the opportunities ahead of time.
The computing power of the mobile device is very weak relative to the PC. As the mobile side of the CPU to power consumption indicators maintained at a very low level, restricting the performance indicators. In the App neural network operation will make the CPU computing volume soared. How to coordinate the user power consumption indicators and performance indicators is essential.
Baidu image search client team at the end of 2015 began to move on the depth of learning technology applications to research. Ultimately, the challenge is solved one by one, and today the code has been running on many App, these App have PV billion products, but also the entrepreneurial period of the product.
In the application of the depth of learning on the mobile side of the technology has been difficult, and in the mobile Baidu this level of application on the product, but also to face a variety of models and hardware, mobile Baidu indicators requirements. How to make the neural network technology is stable and efficient operation is the biggest test. The dismantling problem is the top issue facing the mobile team. We simply sum up and found that mobile and server-side comparison is more likely to present problems and difficulties, and then in the server and client to do the following depth of learning technology application comparison.
Difficulty and server-side comparison Memory memory: Server-side weak limit - Mobile-side memory Limited power consumption Power consumption: Server-side unrestricted - Mobile-side strict limits Dependent library volume Dependent library volume: Server-side unrestricted - Mobile strong limit model Volume model size: server-side conventional model volume 200M up - the mobile side should not exceed 10M Performance: server-side powerful GPU BOX - mobile CPU and GPU
In the development process, the team gradually solve the above difficulties, the formation of the current MDL depth learning framework. In order to allow more mobile engineers to quickly use the wheels, focus on business, Baidu open source all the relevant code, the community also welcome anyone to join the wheel development process.
MDL framework design
As a mobile learning platform, we take into account the characteristics of the mobile application itself and the operating environment, and put forward strict requirements in terms of speed, volume and resource occupancy rate, because any one of the indicators has a significant impact on the user experience influences.
At the same time, scalability, robustness, compatibility is also the beginning of our design to take into account. In order to ensure the scalability of the framework, we abstract the layer to facilitate the implementation of specific types of layers based on the needs of the model. We expect MDL to support more network models by adding different types of layers In order to ensure the robustness of the framework, MDL through the reflection mechanism, the C + + layer will be thrown to the application layer, the application layer through the capture exception to deal with abnormal, such as through the log to collect abnormal information, In order to ensure the compatibility of the framework, we provide Caffe model to MDL tool script, the user through a line of command to ensure that the framework of the training, We can complete the model conversion and quantification process, follow-up we will continue to support PaddlePaddle, TensorFlow model to MDL, compatible with more types of models.
The overall architecture of the MDL framework is as follows:
The MDL framework mainly includes the MDL Converter, the Model Loader, the Network Management Module (Gemmers), and the JNI Interfaces (JNI Interfaces) for the Android side. Among them, the model conversion module is mainly responsible for the Caffe model to MDL model, while supporting the 32bit floating point parameters into 8bit parameters, which greatly compress the model volume; model load module to complete the model of reverse quantization and load validation, network Registration and other processes, the network management module is mainly responsible for the layers of the network layer initialization and management work; MDL provides for the Android side call JNI interface layer, developers can call JNI interface to easily complete the loading and forecasting process.
MDL positioning is simple and available
MDL open source project in the implementation of the beginning has been a clear positioning. It is very attractive to find the right scene for the new depth of learning technology and apply it to your own product in the process of R & D of the mobile terminal platform technology with low performance and low performance. But if each mobile engineer in the application of the depth of the learning process to re-write all the implementation of the neural network, will increase the greater cost. MDL positioning is to simply use and deploy the neural network, if you use the basic functions do not need to be too much configuration and modification, and even the machine learning library compilation process is not needed, only need to focus on specific business implementation, how to use.
At the same time MDL simple and clear code structure can also be used as learning materials, just for the depth of learning R & D engineers to provide a reference. Because we support the mobile platform cross-compiler at the same time, also supports Linux and Mac x86 platform compiler, in the depth of learning the code can be compiled directly on the working computer to run, without the need to deploy to the arm platform. What you need is a simple line of code, you can see the MDL GitHub Readme.
# https://github.com/baidu/mobile-deep-learning# mac or linux:./build.sh maccd build / release / x86 / build
Complex compilation process is often longer than the development time, as long as the line in the MDL. /build.sh android will be able to test so test, the deployment is very simple.
MDL performance and compatibility
Volume armv7 300k +
Speed iOS GPU mobilenet can reach 40ms, squeezenet can reach 30ms
MDL from project to open source, has been iterated for more than a year. Mobile side of the more concerned about the performance of a number of indicators, such as volume, power consumption, speed. Baidu internal product line before the application has also been compared many times, and has been open source related projects, MDL can guarantee the speed and energy consumption while supporting a variety of depth learning models, such as mobilenet, googlenet v1, squeezenet, etc., and has iOS GPU version, squeezenet can run up to 3-40ms.
Similar frame comparison
Frame Caffe2TensorFlowncnnMDL (CPU) MDL (GPU) hardware CPUCPUCPUCPUGPU speed slowly fast fast size large and small compatible Android & amp; iOSAndroid & amp; iOSAndroid & amp; iOSAndroid & amp; iOSiOS
Compared with the CNN support mobile framework, MDL is fast, stable performance, good compatibility, demo complete.
MDL in iOS and Android platform can be stable operation, which iOS10 and above platforms based on GPU computing API, performance is very good, in the Android platform is pure CPU running. High school low-end models running status and mobile phone Baidu and other App coverage has an absolute advantage.
MDL also supports the Caffe model directly converted to MDL model.
MDL features at a glance
In the mobile AI-related research and development started at the beginning, Baidu image search team compared to most of the already open source CNN framework, a hundred schools of thought also exposed the direction of the problem. Some of the framework of experimental data performance is excellent, the actual product or poor performance and performance is very unstable, or the model can not be fully covered, or the volume does not meet the on-line standards. In order to avoid these problems, MDL joined the following Features:
A key to deploy, script parameters can switch iOS or Android
Support Caffe model automatically converted to MDL model
Support GPU running
Has been tested to be able to run MobileNet, GoogLeNet v1, squeezenet model
Very small, no third party dependent, pure hand to create
Provide quantitative script, direct support 32-bit float to 8-bit uint, model volume quantization in the 4M up and down
And ARM-related algorithm team online and offline communication several times, for the ARM platform will continue to optimize
NEON use covers the convolution, normalization, pooling and other operations
loop unrolling loop expansion, in order to improve performance to reduce unnecessary CPU consumption, all start to determine the operation
Put a lot of heavy computational tasks before the overhead process
Follow - up planning
In order to further reduce the MDL volume, MDL did not use protobuf as a model configuration store, but the use of Json format. Currently MDL supports the Caffe model to transition to the MDL model, which will support all mainstream models into MDL models in the future.
With the mobile device computing performance improvement, GPU in the future mobile side of the computing area will assume a very important role, MDL for GPU implementation is extremely valued. MDL currently supports iOS GPU running, iOS10 and above models can be used. According to the current statistics show that, iOS10 has covered most of the iOS system, iOS10 can be used in the following CPU operation. In addition, although the Android platform, the current GPU computing power compared to the overall weak CPU, but the emerging new models GPU has become increasingly powerful. MDL will also be behind the GPU's Feature implementation, based on OpenCL's Android platform GPU computing will enable high-end models to enhance the performance of a higher level.
Welcome to the developer contribution code
The stable and efficient operation of the mobile neural network is inseparable from the coding of many developers. MDL long-term in the Kaopu run, practical and not the purpose of the United States, hoping to learn the skills for the depth of mobile building blocks. Strongly welcome people of insight to join, the depth of learning technology in the mobile side of a wide range of applications, broadcast the sea.
Finally, once again offer MDL's GitHub directory: