Integrate Python and C/C++ (1)

In python official document there are two chapters with the topic of integrating python and C/C++

  • Extending and Embedding – tutorial for C/C++ programmer
  • Python/C API

In addition, there are several open-source utilities that make programmers’ lives easier, the most popular ones are

  • SWIG
  • Boost.Python/Pyl11
  • Cython
  • CFFI

This series posts explain the practical experience of the integration of python and C/C++.

Create a simple python module with C

The Python document gives a very simple example (spam), with full source code and detail explanation. What is missing is how you run on a real computer and what the output looks like.

Source Code

Here is the full source code with some concise comment

/* module method

 the name does not matter, and should be static, because the method
 is exposed by its implementation name, it is by method-define-array
 (see below)

 as practical convention, make your function name as module_method_name
*/

static PyObject*
spam_system(PyObject *self, PyObject *args)
{
 const char* command;

 int rc;

 if (!PyArg_ParseTuple(args, "s", &command))
 return NULL;

 rc = system(command);
 return Py_BuildValue("i", rc);
}

/*
 An array of PyMethodDef is passed to module initializer.
 This array defines all methods, each item is

 { name, function_pointer, argument_type, description }

 This array should also be static (no need to be exposed)
*/
static PyMethodDef SpamMethods[] = {
 { "system", spam_system, METH_VARARGS, "Execute a shell command." },

 // other methods

 // end of list
 {NULL, NULL, 0, NULL }
};


/*
 Each should have one (and only one) module initializer, that introduces
 module objects into python namespace.

 Its name MUST BE init.

 For example if the name is changed to init_spam, it still can be
 compiled but python can't import it:

python test.py echo hello
Traceback (most recent call last):
 File "test.py", line 4, in 
 import spam
ImportError: dynamic module does not define init function (initspam)

*/
PyMODINIT_FUNC
initspam(void)
{
 PyObject *m;

 m = Py_InitModule("spam", SpamMethods);
 if (m == NULL)
 return;
}

Manually build (on linux)

The document doesn’t mention the build process, instead it recommends to use distutils module. It is true, however, for curiosity I still decide to try to build manually. Here is the make file

$ cat manual.mk
# use pkg-config to find python information
#
# pkg-config --list-all
# pkg-config --cflags --libs python2
#
# -I/usr/include/python2.7 -I/usr/include/x86_64-linux-gnu/python2.7 -lpython2.7
CFLAGS = -g -fPIC -I/usr/include/python2.7
LDFLAGS = -g -shared -fPIC -L/usr/lib/python2.7
LIBS = -lpython2.7

all: spam.so

spam.o: spam.c
        $(CC) -c $(CFLAGS) -o $@ $<

spam.so: spam.o
        $(LD) -o $@ $< $(LDFLAGS) $(LIBS)

clean:
        rm spam.so spam.o

The build process

$ make -f manual.mk
cc -c -g -fPIC -I/usr/include/python2.7 -o spam.o spam.c
ld -o spam.so spam.o -g -shared -fPIC -L/usr/lib/python2.7 -lpython2.7

The python script that tests our new module spam

$ cat test.py
#! /usr/bin/env python

import sys,os
import spam

cmd = ' '.join(sys.argv[1:])
print 'cmd=[{}]'.format(cmd)
rc = spam.system(cmd)
print 'rc={:x}'.format(rc)

And the result of running.

$ python test.py hello
cmd=[hello]
sh: 1: hello: not found
rc=7f00
$ python test.py echo hello
cmd=[echo hello]
hello
rc=0

Proper method of building (on linux)

As python document recommends, it is much easier (and more standard and flexible) to build with distutils module

$ cat setup.py
#from distutils.core import setup, Extension
# distutils.core is obsolete, use setuptools instead
from setuptools import setup, Extension

# a package consists of one or more modules, each module consists of
# one or more source files
# this is module spam, built by source: spam.c
module1 = Extension('spam', sources = ['spam.c'])

# this is a package spam, consist of module1
setup(name='spam',
      version='1.0',
      description='This is a demo package',
      ext_modules=[module1])

The process that run with distutils

$ python setup.py build
running build
running build_ext
building 'spam' extension
creating build
creating build/temp.linux-x86_64-2.7
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c spam.c -o build/temp.linux-x86_64-2.7/spam.o
creating build/lib.linux-x86_64-2.7
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/spam.o -o build/lib.linux-x86_64-2.7/spam.so
jasonz@jzdebian $ find build
build
build/lib.linux-x86_64-2.7
build/lib.linux-x86_64-2.7/spam.so
build/temp.linux-x86_64-2.7
build/temp.linux-x86_64-2.7/spam.o

Compare the output of the two build method

$ ls -l spam.o build/lib.linux-x86_64-2.7/spam.so
-rwxr-xr-x 1 jasonz jasonz 17304 Nov 30 13:46 build/lib.linux-x86_64-2.7/spam.so
-rw-r--r-- 1 jasonz jasonz 16752 Nov 30 13:49 spam.o
$ file spam.o build/lib.linux-x86_64-2.7/spam.so
spam.o:                             ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
build/lib.linux-x86_64-2.7/spam.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=9cdc9b1ebf2df10d243fbd931ecea38f3bd63dfc, not stripped

Build with distutils on Windows

Python 3.5 and later support VS2015 and later. For Python 2.7, Microsoft offers a special version of VC++, which is actually an engine of VC++2008.

Note:

  1. Python 2.7 for Windows is built by VC2008, this can be checked by launching python interpreter and check its build info, which should by MSC v.1500
  2. Without VC2008, setup.py auto detect and complains error “Unable to find vcvarsall.bat“.
  3. some old version of setup.py uses module distutils.core, which lacks the auto-detect ability, replace it with setuptools
  4. Microsoft requests to update setuptools to version 6.0 or later for letting the special version VC++ works
  5. download setuptools 6.0.1 ez_setup.py
  6. run command python ez_setup.py
Here is the process of build module on Windows.
D:\codex\python\ext>python setup.py build
running build
running build_ext
building 'spam' extension
creating build
creating build\temp.win-amd64-2.7
creating build\temp.win-amd64-2.7\Release
C:\Users\jasonz\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27\include -IC:\Python27\PC /Tcspam.c /Fobuild\tem
p.win-amd64-2.7\Release\spam.obj
spam.c
creating build\lib.win-amd64-2.7
C:\Users\jasonz\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\Python27\libs /LIBPATH:C:\Python27\PCbuild\amd64 /LIBPA
TH:C:\Python27\PC\VS9.0\amd64 /EXPORT:initspam build\temp.win-amd64-2.7\Release\spam.obj /OUT:build\lib.win-amd64-2.7\spam.pyd /IMPLIB:build\temp.win-amd64-2.7\Release\spam.lib /MANIFESTFILE:build\tem
p.win-amd64-2.7\Release\spam.pyd.manifest
spam.obj : warning LNK4197: export 'initspam' specified multiple times; using first specification
   Creating library build\temp.win-amd64-2.7\Release\spam.lib and object build\temp.win-amd64-2.7\Release\spam.exp

D:\codex\python\ext>python test.py echo hello
cmd=[echo hello]
hello
rc=0

D:\codex\python\ext>python test.py  hello
cmd=[hello]
'hello' is not recognized as an internal or external command,
operable program or batch file.
rc=1

D:\codex\python\ext>
Advertisements

Parse Command Line arguments

Bash

#!/bin/bash
OPTIND=1           # reset in case getopts has been used previously
# default settings
output=""
verbose=0
while getopts "h?vf:" opt; do
    case "$opt" in
    h|\?)  show_help; exit 1;;
    v) verbose=1;;
    f) output=$OPTARG;;
    esac
done

shift $((OPTIND-1))
[ "$1" = "--" ] && shift;
echo "verbose=$verbose, output=$output, leftover: $@"

Reference:

Perl

#! /usr/bin/env perl

use strict;
use warnings;
use Getopt::Std;

sub usage {
    print "usage:\n".
          "$0 -hvf other_list\n".
          "  -h     help\n".
          "  -v     verbose\".
          "  -f file output file\n".
          "\n";
    exit 1;
}

my %opts;
my $rc = getopts('hvf:', \%opts);
usage() if ($rc || $opts{h});
print "options:\n";
print "$_: $opts{$_}\n" foreach (sort keys %opts);
print "leftovers: @ARGV\n";

Reference:

Python

#! /usr/bin/env python

import argparse

parser = argparse.ArgumentParser()
parser.ad_argument('-f', '--output_file', type=str,
                     default='my_output', help='output file name')
parser.add_argument('-v', '--verbose', action='store_true', help='verbose mode')
args = parser.parse_args()
if args.verbose:
    print 'verbose mode'
with open(args.output_file, 'w') as f:
    # do something
    pass

Reference:

C/C++

under construction

Perl Tricks – each %hash

Perl has three ways to iterate a hash:

  • keys %hash iterates keys
  • values %hash iterates values
  • each %hash iterates the pair (key, value)

Perl contains an internal iterator for such kind of iteration. So far so good, as long as you finish the iteration every time. What if you break the iteration before it finishes, and start a new iteration later? According to perl doc, the iterator is reset only when

  • end of iteration
  • keys or values is called

so the script might not work as you expect in the following pattern:

while (my ($k, $v) = each %hash) {
      # do some work
      last if ($k == $somekey || $v == $somevalue);
}
# sometime later
while (my ($k, $v) = each %hash) {  # iterator is not reset if the previous while not finish
         # some work
}

The conclusion – don’t use each if you plan to break the iteration, use keys instead.

 

Capture signals in perl

Here’s a sample perl script

#! /usr/bin/perl
=pod

This program demonstrate the usage of %SIG to capture signals in perl.

An alternative solution is using pragma sigtrap.
    use sigtrap qw(handler hdl_int INT QUIT);
The disadvantage of sigtrap is that you can't save and restore
the original handlers.

=cut

use strict;
use warnings;
use Time::HiRes qw( usleep) ;

package ST;             # scope tracer
sub new {
    my $class = shift;
    my $name = shift;
    my $self = {
        name => $name,
    };
    print "++++++++++++++++enter $self->{name}\n";
    bless $self, $class;
    return $self;
}
sub DESTROY {
    my $self = shift;
    print "----------------leave $self->{name}\n";
}



package main;

my %orgSig = ();        # save original handlers if restoring is wished

sub listAllSignals {
    print "supported signals\n";
    foreach (sort keys %SIG) {
        unless (/NUM\d+/) {
            my $v = $SIG{$_} || '';
            print "  $_=$v\n";
        }
    }
}

sub longtask {
    # emulating a task that runs for a while.
    my $title = shift || 'some long-time task';
    my $loop = shift || 5;
    my $interval = shift || 1;
    my $v = new ST($title);
    foreach (1..$loop) {
        print "      $title: working ...$_\n";
        usleep($interval * 1000000);
    }
}

sub foo {
    my $s = ST->new('foo');
    my $greet = shift;

    print "$greet from foo\n";
    longtask('foo body', 10, 0.5);
}

sub hdl_int {
# signal is blocked and appended in the queue
# DURING the signal handler is running,
# so restoring original handler inside the handler
# IS GENERARLLY NOT a good idea unless that is expected behavior
    my $s = ST->new('hdl_int');
    longtask('hdl_int body', 5, 0.2);
#   $SIG{'INT'} = $orgSig{'INT'};
}

$orgSig{$_} = $SIG{$_} foreach (keys %SIG);
listAllSignals();

# these two signals are used for "normal" exit
$SIG{'INT'} = \&hdl_int;
$SIG{TERM} = sub { print "I captured TERM\n"; };

# There two can't be captured
#   For sure KILL (9) can't be captured
#   According to doc, QUIT is not able to be captured either,
#   however the interesting fact is that it's captured on debian 8
$SIG{QUIT} = sub { print "I captured QUIT\n"; };
$SIG{KILL} = sub { print "I captured KILL\n"; };


print("hello world, my pid=$$\n");
sleep(3);
foo("Greeting");
print("The end\n");

Here is the output when Control-C is pressed twice during the running

jasonz@jzdebian$ perl sig.pl
supported signals
  ABRT=
  ALRM=
  BUS=
  CHLD=
  CLD=
  CONT=
  FPE=IGNORE
  HUP=
  ILL=
  INT=
  IO=
  IOT=
  KILL=
  PIPE=
  POLL=
  PROF=
  PWR=
  QUIT=
  RTMAX=
  RTMIN=
  SEGV=
  STKFLT=
  STOP=
  SYS=
  TERM=
  TRAP=
  TSTP=
  TTIN=
  TTOU=
  UNUSED=
  URG=
  USR1=
  USR2=
  VTALRM=
  WINCH=
  XCPU=
  XFSZ=
hello world, my pid=47074
^C++++++++++++++++enter hdl_int
++++++++++++++++enter hdl_int body
      hdl_int body: working ...1
      hdl_int body: working ...2
      hdl_int body: working ...3
^C      hdl_int body: working ...4
      hdl_int body: working ...5
----------------leave hdl_int body
----------------leave hdl_int
++++++++++++++++enter hdl_int
++++++++++++++++enter hdl_int body
      hdl_int body: working ...1
      hdl_int body: working ...2
      hdl_int body: working ...3
      hdl_int body: working ...4
      hdl_int body: working ...5
----------------leave hdl_int body
----------------leave hdl_int
++++++++++++++++enter foo
Greeting from foo
++++++++++++++++enter foo body
      foo body: working ...1
      foo body: working ...2
      foo body: working ...3
      foo body: working ...4
      foo body: working ...5
      foo body: working ...6
      foo body: working ...7
      foo body: working ...8
      foo body: working ...9
      foo body: working ...10
----------------leave foo body
----------------leave foo
The end
jasonz@jzdebian$ 

The interesting fact is that QUIT is captured although the linux document declares that it is not capture-able. Here is the snippet of the output

  XCPU=
  XFSZ=
hello world, my pid=47344
++++++++++++++++enter foo
Greeting from foo
++++++++++++++++enter foo body
      foo body: working ...1
      foo body: working ...2
      foo body: working ...3
I captured QUIT
      foo body: working ...4
      foo body: working ...5
      foo body: working ...6
      foo body: working ...7
      foo body: working ...8
      foo body: working ...9
      foo body: working ...10
----------------leave foo body
----------------leave foo
The end
jasonz@jzdebian$ 

when a QUIT signal is sent

jasonz@jzdebian$ kill -QUIT 47344
jasonz@jzdebian$ 

Extended Usage – capture warn() and dir()

This is extremely convenient to log all those information

sub WARN_handler {
    my ($signal) = @_;
    log("WARN: $signal");
}
sub DIE_handler {
    my ($signal) = @_;
    log("DIE: $signal");
}
sub log {
    my (@array) = @_;
    open(LOGFILE, ">>my.log");
    print LOGFILE (@array);
    close(LOGFILE);
}

$SIG{__WARN__} = 'WARN_handler';
$SIG{__DIE__} = 'DIE_handler';
chdir('/printer') or warn($!);
chdir('/printer') or die($!);

Parsing XML with Perl

#! /usr/bin/perl

use strict;
use warnings;

use XML::Parser;

#
# A simple script shows how Expat stream parser works
#

my $depth = 0;
sub indent      # indent 4 spaces when depth increases
{
    my $s = shift;
    my $i = '';
    $i .= '    ' for (1..$depth);
    join("\n", map { $i.$_ } split("\n", $s))."\n";
}

# stream: event => handler
my $handlers = {
    Start => sub {
        my ($expat, $ele, %attr) = @_;
        my $s = "start: [$ele]\n";
        while (my ($k,$v) = each(%attr))
            { $s .= "    ----$k=$v\n"; }
        print indent("$s");
        $depth++;
    },
    End => sub {
        my ($expat, $ele) = @_;
        $depth--;
        print indent("end: [$ele]");
    },
    Char => sub {
        my ($expat, $str) = @_;
        $str =~ s/^\s*//;
        $str =~ s/\s*$//;
        print indent("char: [$str]") if (length($str));
    },
    Comment => sub {
        my ($expat, $str) = @_;
        print indent("comment: [$str]");
    },
    CdataStart => sub {
        my ($expat) = @_;
        print indent("cdataStart");
    },
    CdataEnd => sub {
        my ($expat) = @_;
        print indent("cdataEnd");
    },
};


my $p = XML::Parser->new( Handlers => $handlers );
my $fname = $ARGV[0];
eval { $p->parsefile($fname); };
print "ERROR: $@\n" if ($@);


Run the script with this input

<?xml version="1.0" encoding="utf-8"?>
<TestEnvironment name="test.xml" id="1" envFile="~/testhome/test.xml">
    <!--This is a comment-->
    <SqlWCharEncoding>UTF-16</SqlWCharEncoding>
    <empty />
    <BaselineDirectory>
        ResultSets
 <TSE-Sql2TableColumnMap>stcmap.txt</TSE-Sql2TableColumnMap>
        Post char
    </BaselineDirectory>
    <SQL>
      before
      <! [CDATA[select * from mytable; <tag /> ] ]>
      after
    </SQL>
</TestEnvironment>

and the result is

jasonz@jzdebian$ ./xml-test.pl ~/testhome/test.xml
start: [TestEnvironment]
    ----name=test.xml
    ----envFile=~/testhome/test.xml
    ----id=1
    comment: [This is a comment]
    start: [SqlWCharEncoding]
        char: [UTF-16]
    end: [SqlWCharEncoding]
    start: [empty]
    end: [empty]
    start: [BaselineDirectory]
        char: [ResultSets]
        start: [TSE-Sql2TableColumnMap]
            char: [stcmap.txt]
        end: [TSE-Sql2TableColumnMap]
        char: [Post char]
    end: [BaselineDirectory]
    start: [SQL]
        char: [before]
        cdataStart
        char: [select * from mytable; ]
        cdataEnd
        char: [after]
    end: [SQL]
end: [TestEnvironment]
jasonz@jzdebian$