Preferential Attachment in Python

A few posts ago, I mentioned the model of network growth by preferential attachment. This is a big enough topic in network theory that it’s worth poking in detail. I discussed some of the subject’s history in this paper (1.2 Mb PDF, plus presentation), which they tell me passed some stage of review and will appear in the print volume of the conference proceedings, i.e., an overpriced Springer book nobody will actually buy. But in addition to learning the names and dates involved in the story, we would like to play around with the ideas ourselves.

The other day, I realized how easy this would be, and now that I’ve actually presented this to a classroom full of people, I’d like to write about it. Today, I’ll present a Python program for growing a network by the “rich get richer” effect known as preferential attachment.

This little tyke of a program will use two kinds of data structures: lists and dictionaries. A list is denoted by a set of comma-separated values enclosed in square brackets, thus:

beatles = ["john", "paul", "george", "ringo"]

Having built the list beatles in this way, we can manipulate it, for example by adding a fifth Beatle:

beatles.append("george martin")

The contents of our list after executing this command would be ["john", "paul", "george", "ringo", "george martin"]. Another useful item we will employ is the len() function, which returns the length of a list. The initial value of len(beatles) would be 4, and after adding “george martin” the command

print len(beatles)

would output the number 5.

A dictionary is much like a list, except that its values are labeled by keys. For example,

songAuthors = {"i me mine": "george", "lucy in the sky with diamonds": "john"}

The statement print songAuthors["i me mine"] would output the string george. For further details, see Guido van Rossum’s Python tutorial. Our representation of networks will follow that given in one of Guido’s essays: each node will be an item in a dictionary, and the value of that item will be a list of the other nodes to which it is connected.

Because scientists in general (and complex-systems people in particular) love pretty pictures, I’m going to include a feature for graphic output. Of the possible software implementations we could choose here, I’m picking SciPy (not to be confused with ScientificPython). Windows users can get SciPy and all the accessory packages bundled together by downloading Enthought Python. Linux users, or at least the Ubuntu and Debian people, can apt-get install the packages python-scipy and python-matplotlib. (You might as well grab python-scientific while you’re at it.) Mac people. . . er, try here; you should cruise by and pick up SciPy, NumPy and Matplotlib.

Finally, the code!

#!/usr/bin/python
# pref-attach.py
# Blake Stacey (bstaceyAT.alum.removethis.mit.andthis.edu)
import math, random
import scipy, pylab     # only if you want graphics


dNetwork = {}       # dictionary of lists
iNodes = 1000       # total number of nodes
iLinks = 0

for i in range(iNodes):
	dNetwork[i] = []      # initialize node of key i with empty list
	for node in dNetwork.values():
		fThresh = 1.0 / (iLinks + i + 1) * (len(node) + 1)
		if(random.random() <= fThresh):
			node.append(i)
			iLinks += 1

# calculate the degree distribution
lDegrees = [len(node) for node in dNetwork.values()]
print lDegrees

# and draw a histogram
pylab.hist(lDegrees,50)
pylab.xlabel("Node Degree")
pylab.ylabel("Number of Nodes")
pylab.title("Preferential Attachment")
pylab.show()

And, why not, a sample of the output:

Histogram of node degree

The only moderately tricky feature of the code (other than the external visual-output libraries) is the use of a "list comprehension" to calculate the node degrees. Even if you don't include the graphics part, printing out the list lDegrees should show you that there are a great many nodes with few connections and a smaller number of more heavily connected ones.

Characterizing the form of the degree distribution is left as an exercise to the interested reader. (Hint, hint.) Oh, all right, I might write about that soon, as well as addressing the topic of network visualization in more detail.

UPDATE (29 July 2007): Randall Farmer has ported this proglet to JavaScript.

4 thoughts on “Preferential Attachment in Python”

  1. Mmm, Python fun. And it worked with no post-installation wrangling! Must play with when I have more time.

    A post here a while back mentioned a paper describing the sexual network in a high school — who was with whom. When the researchers plotted the network out, it looked like a spanning tree with a lot of long “chains” of relationships (Alice was with Bob was with Carol was with Doug…) but fewer cycles than you’d expect (Doug wasn’t with Alice). They try a few explanations out, and they end up concluding that the funny structure emerged because people avoid dating their ex’s current boyfriend/girlfriend’s ex. That and the fact that most relationships were heterosexual led to few cycles of length 4, they said. Hm. I wouldn’t know; my ex’s boyfriend/girlfriend’s ex is None.

    (Some credit here goes to my engineer friend who pulled me into an instant-message conversation about this paper in the dark pre-Sunclipse ages.)

    Sadly, the data costs money, looks like.

  2. hi, I just find this webpage through google. I don’t know Python. But I want to know the theoretical model of your code because this doesn’t look like the famous BA model. BA model produces m connections in each time step. m is predetermined. But in your setting, the connection could be any since it is determined by the comparision between random number and fThresh for a node i. I don’t quite understand the underlying logic. Can you explain more?

  3. You’re right; it isn’t exactly the Barabasi-Albert model. However, when the multiplicative factor in the threshold probability calculation is small, the total number of edges which can be added at any timestep is also small, so (at least when I played with it) the result looks effectively indistinguishable from a network grown by the BA model. The probability of connecting to a node is still proportional to the number of edges that node already has; the only difference is that I didn’t preordain the number of edges created at each timestep.

Comments are closed.