ARTS Week 23

KeepNewbie_yan
5 min readJan 30, 2022

Algorithm

This week’s LeetCode problem is 50. Pow(x, n)

Implement pow(x, n), which calculates x raised to the power n (i.e., xn).

Input:x = 2.00000, n = 10
Output:1024.00000
Input:x = 2.00000, n = -2
Output:0.25000
Explanation:2^-2 = (1/2)^2 = 1/4 = 0.25

In order to use the dichotomy method, it is necessary to first ensure that n is always positive, so it is necessary to convert xn. First, judge n. If n < 0, the n-th power of x is equal to pow(1/x, -n). When n = 0 or 1, pow(x, n) is always x, only when n>1 , the nth power of x is to calculate pow(x,n). In the function calPow(double x, long n) that actually calculates xn, if n==0, it returns 1, and the result tmp of the xn/2 is obtained by recursion first, if n is Even, the final result is tmp*tmp; if n is odd, then the final result is tmp*tmp*x, the code is as follows:

class Solution {
public double myPow(double x, int n) {
if (x == 0 || x == 1) {
return x;
}
if (n > 0) {
return calPow(x, n);
} else if (n < 0) {
return calPow(1/x, -n);
} else {
return 1;
}
}
// use long type: because abs(Integer.MIN_VALUE) > abs(Integer.MAX_VALUE)
public double calPow(double x, long n) {
if (n == 0) {
return 1;
}
double tmp = calPow(x, n/2);
if (n % 2 == 0) {
return tmp * tmp;
} else {
return tmp * tmp * x;
}
}
}

Review

This week’s Review is for the following article: C Runtime Overhead

The author introduces what to do when the C language library (libc) becomes the bottleneck of program execution.

The author traced an empty program with strace -tt and found that it took up to 9.4ms between executions on his machine, which is entirely the overhead of the linker and the C language runtime

// strace output for `int main() { return 0; }`00:32:27.539873 execve("./overhead", ["./overhead"], [/* 15 vars */]) = 0
00:32:27.541060 brk(0) = 0xa7b000
00:32:27.541466 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
00:32:27.541834 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
00:32:27.542199 fstat(3, {st_mode=S_IFREG|0644, st_size=36586, ...}) = 0
00:32:27.542438 mmap(NULL, 36586, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4f640be000
00:32:27.542865 close(3) = 0
00:32:27.543109 open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
00:32:27.543455 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\ 0`\1\2\0\0\0\0\0"..., 832) = 832
00:32:27.543782 fstat(3, {st_mode=S_IFREG|0755, st_size=1984416, ...}) = 0
00:32:27.544162 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4f640bd000
00:32:27.544466 mmap(NULL, 3813200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4f63b03000
00:32:27.544796 mprotect(0x7f4f63c9c000, 2097152, PROT_NONE) = 0
00:32:27.545104 mmap(0x7f4f63e9c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x199000) = 0x7f4f63e9c000
00:32:27.545503 mmap(0x7f4f63ea2000, 16208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4f63ea2000
00:32:27.545775 close(3) = 0
00:32:27.546087 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4f640bc000
00:32:27.546386 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4f640bb000
00:32:27.546664 arch_prctl(ARCH_SET_FS, 0x7f4f640bc700) = 0
00:32:27.547373 mprotect(0x7f4f63e9c000, 16384, PROT_READ) = 0
00:32:27.548209 mprotect(0x7f4f640c7000, 4096, PROT_READ) = 0
00:32:27.548533 munmap(0x7f4f640be000, 36586) = 0
00:32:27.548949 exit_group(0) = ?
00:32:27.549272 +++ exited with 0 +++

Then the author tries to use inline assembly and -ffreestanding -nostdlib to test the check:

// gcc -m32 -ffreestanding -nostdlib
void _start() {
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
}
// strace -tt output
// 00:41:38.884122 execve("./blank", ["./blank"], [/* 15 vars */]) = 0
// 00:41:38.884552 exit(0) = ?
// 00:41:38.884629 +++ exited with 0 +++

It can be seen that this program without any system calls only runs for 0.5ms. Then the author tried to compile the program statically, and the running time was reduced to about 8ms. The output obtained from the trace is as follows:

00:35:19.095150 execve("./main-static", ["./main-static"], [/* 15 vars */]) = 0
00:35:19.095973 uname({sys="Linux", node="vagrant-arch", ...}) = 0
00:35:19.096373 brk(0) = 0x16cf000
00:35:19.096724 brk(0x16d01c0) = 0x16d01c0
00:35:19.097065 arch_prctl(ARCH_SET_FS, 0x16cf880) = 0
00:35:19.097387 readlink("/proc/self/exe", "/vagrant/level0/main-static", 4096) = 27
00:35:19.097779 brk(0x16f11c0) = 0x16f11c0
00:35:19.098140 brk(0x16f2000) = 0x16f2000
00:35:19.098539 fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
00:35:19.098837 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d5a7cb000
00:35:19.099192 read(0, "Collaboratively administrative emp"..., 4096) = 1398
00:35:19.099544 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
00:35:19.099893 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8d5a7ca000

Finally, the author wrote a minimal libc library to compile with the -ffreestanding -nostdlib flag, and the running time was reduced to 1ms, which is close to 1/ the running time of glibc 10, which has only five system calls:

00:53:41.882305 execve("./main-43.1", ["./main-43.1"], [/* 15 vars */]) = 0
00:53:41.882920 read(0, "Collaboratively administrative emp"..., 4096) = 1398
00:53:41.882989 read(0, "", 4096) = 0
00:53:41.883006 write(1, "Collaboratively administratively emp"..., 1449) = 1449
00:53:41.883023 _exit(0) = ?
00:53:41.883356 +++ exited with 0 +++

Summary: If you feel that the program or process startup time is slow and really needs to be optimized from the library level, then it is worth your time to try another libc implementation (such as musl libc or Diet libc)

Tip

In the example of callback function in C language. the callback function is to treat one function as a parameter of another function. The most classic application is to customize the function of comparison size in the sorting algorithm. The following is a simple example, the operate function calls the add and multi functions to achieve the effect of addition and multiplication respectively.

#include <stdio.h>int operate(int a, int b, int (* func) (int a, int b));
int add(int a, int b);
int multi(int a, int b);
int main()
{
int a = 3, b = 4;
printf("%d + %d = %d\n", a, b, operate(a, b, add));
printf("%d * %d = %d\n", a, b, operate(a, b, multi));
return 0;
}
int operate(int a, int b, int (* func) (int a, int b))
{
return func(a, b);
}
int add(int a, int b)
{
return a+b;
}
int multi(int a, int b)
{
return a*b;
}

The result of running the program is as follows:

$ gcc test.c
$ ./a.out
3 + 4 = 7
3 * 4 = 12

Share

Now I have an idea about the topic of a non-ARTS technical blog post, which is roughly divided into three parts. I am currently working hard on writing, come on!

--

--

KeepNewbie_yan

A programmer. Share knowledge of programming, operating system, Linux kernel, and reading, thinking etc. Let us maintain a beginner mend and grow together!