Skip to content

Commit dd63cfd

Browse files
committed
First proper release
1 parent 18ba152 commit dd63cfd

File tree

3 files changed

+474
-1
lines changed

3 files changed

+474
-1
lines changed

Diff for: README.md

+238-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,238 @@
1-
# sockpuppet
1+
# SockPuppet
2+
#### Having fun with WebSockets, Python, Golang and nytimes.com <br>
3+
<img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px"> <img src ="http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg" height="50px">
4+
5+
6+
<br>
7+
### What's this all about?
8+
Did you ever wonder how **nytimes.com** pushes breaking news articles to the front page while you have it open in your browser? Well, I used my browser's developer tools to look at what's going one and it turns out, they don't periodically reload JSON data but use websockets to push new events directly to your browser ([see here](https://developer.mozilla.org/en-US/docs/WebSockets) for more information about websockets).<br>
9+
It's a system called `nyt-fabrik`, here are a few talks and presentations where they give some insight into the architecture: [search google for "nytimes fabrik websockets"](https://www.google.com/search?q=nytimes+fabrik+websockets).
10+
11+
There is example code, see [here for the Python code](blob/master/sockpuppet.py) and [here for the Golang example](blob/master/sockpuppet.go).
12+
13+
<br>
14+
### Cool, so how does it work?
15+
16+
When you go to **nytimes.com**, your browser will establish a websocket connection with the NYT fabrik server and, after a little login dance, will start listening for news events.
17+
Your browser opens a websocket TCP connection to e.g. `ws://blablabla.fabrik.nytimes.com./123/abcde123/websocket` and the server sends a one-character frame `o` which is a request to provide some sort of login identification.<br>
18+
The client (your browser) responds with `["{\"action\":\"login\",\"client_app\":\"hermes.push\",\"cookies\":{\"nyt-s\":\"SOME_COOKIE_VALUE_HERE\"}}"]` and next thing you know you, you either receive a `h` every 20-30 seconds which is some sort of keep-alive or a frame that starts with `a` and has all sorts of data encoded as JSON.
19+
20+
If we receive a message starting with `a`, we can strip the first character and JSON decode the rest.
21+
22+
```json
23+
{
24+
"body": "{\"status\":\"updated\",\"version\":1,\"links\":[{\"url\":\"http://www.nytimes.com/2015/05/26/us/cleveland-police.html\",\"count\":0,\"content_id\":\"100000003702598\",\"content_type\":\"article\",\"offset\":0}],\"title\":\"Cleveland Is Said to Settle Justice Department Lawsuit Over Policing\",\"start_time\":1432581057,\"display_duration\":null,\"label\":\"Breaking News\",\"last_modified\":1432581057,\"display_type_id\":1,\"end_time\":1432581057,\"id\":34931339,\"sub_type\":\"BreakingNews\"}",
25+
"timestamp": "2015-05-21T11:21:11.123456Z",
26+
"hash_key": "34131339",
27+
"uuid": "1234",
28+
...
29+
"account": "nyt1",
30+
"type": "feeds_item"
31+
}
32+
```
33+
34+
If the decoded message has field "body", we can decode it. In case of a breaking news item it looks something like this:
35+
36+
```json
37+
{"status": "updated", "sub_type": "BreakingNews",
38+
"links": [{"url": "http://www.nytimes.com/2015/05/26/us/cleveland-police.html", "count": 0, "content_id": "100000003702598", "content_type": "article", "offset": 0}],
39+
"title": "Cleveland Is Said to Settle Justice Department Lawsuit Over Policing",
40+
"start_time": 1432581057, "display_duration": null, "label": "Breaking News",
41+
"version": 1, "display_type_id": 1, "end_time": 1432581057,
42+
"last_modified": 1432581057, "id": 34131339}
43+
```
44+
<br>
45+
### Neat but how do I access the feed programmatically?
46+
47+
Good question, let's see, we need about 3-4 things to get this to work, easy. For the Python example, I'll be using the [Tornado websocket framework](http://tornado.readthedocs.org/en/latest/websocket.html) and for the Golang example I'll be using the [Golang.org websocket package](https://godoc.org/golang.org/x/net/websocket).
48+
49+
#### Connect to the websocket
50+
51+
In Python, this is easy:
52+
53+
```python
54+
url = "ws://blablabla.fabrik.nytimes.com./123/abcdef123/websocket"
55+
try:
56+
w = yield tornado.websocket.websocket_connect(url, connect_timeout=5)
57+
logging.info("Connected to %s", url)
58+
except Exception as ex:
59+
logging.error("couldn't connect, err: %s", ex)
60+
```
61+
62+
In Golang, it looks about the same:
63+
64+
```go
65+
addr := "ws://blablabla.fabrik.nytimes.com./123/abcdef123/websocket"
66+
ws, err := websocket.Dial(addr, "", "http://www.nytimes.com/")
67+
if err != nil {
68+
log.Fatal(err)
69+
}
70+
log.Printf("Connected to %s", addr)
71+
```
72+
That was easy, wasn't it?
73+
74+
#### Listen for incoming messages
75+
Good, we now are connected and have a websocket object/struct we can work with, let's listen for incoming messages.<br>
76+
77+
Python:
78+
79+
```python
80+
while True:
81+
payload = yield w.read_message()
82+
if payload is None:
83+
logging.error("uh oh, we got disconnected")
84+
return
85+
```
86+
and in Golang:
87+
88+
```go
89+
var msgBuf = make([]byte, 4096)
90+
for {
91+
bufLen, err := ws.Read(msgBuf)
92+
if err != nil {
93+
log.Printf("read err: %s", err)
94+
return
95+
}
96+
```
97+
One caveat here, the Golang version can't handle messages longer than 4k (it'll chunk them into 4k pieces) but for our purposes that's not an issue.
98+
99+
#### Send the login message
100+
101+
If we receive `o` we need to send the login message. We need a cookie value so let's make one up:
102+
103+
```python
104+
if payload[0] == "o":
105+
cookie = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(32))
106+
msg = json.dumps(['{"action":"login", "client_app":"hermes.push", "cookies":{"nyt-s":"%s"}}' % cookie])
107+
w.write_message(msg.encode('utf8'))
108+
logging.info("sent cookie: %s", cookie)
109+
```
110+
111+
In Golang this is a bit more verbose:
112+
113+
```go
114+
if msgBuf[0] == 'o' {
115+
// reply to the login request
116+
cookie := randCookie()
117+
msg := fmt.Sprintf(`["{\"action\":\"login\", \"client_app\":\"hermes.push\", \"cookies\":{\"nyt-s\":\"%s\"}}"]`, cookie)
118+
_, err := ws.Write([]byte(msg))
119+
if err != nil {
120+
log.Fatal(err)
121+
}
122+
log.Printf("Sent cookie: %s\n", cookie)
123+
}
124+
```
125+
and `randCookie()` lookslike this:
126+
127+
```go
128+
func randCookie() string {
129+
letters := []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
130+
b := make([]rune, 30)
131+
for i := range b {
132+
b[i] = letters[rand.Intn(len(letters))]
133+
}
134+
return string(b)
135+
}
136+
```
137+
138+
#### Patiently wait; and (mostly) ignore the `h` messages
139+
Nothing much to do here, whenever we get a `h` message we can simply write `ping` to the console.
140+
141+
```python
142+
elif payload[0] == 'h':
143+
logging.info('ping')
144+
```
145+
and
146+
147+
```go
148+
if payload[0] == "o" {
149+
log.Println("ping")
150+
}
151+
```
152+
153+
154+
#### Decode the news alert message when we receive one
155+
156+
Messages from the server that start with `a` contain JSON encoded data that we can decode.
157+
Python first:
158+
159+
```go
160+
elif payload[0] == 'a':
161+
frame = json.loads(payload[1:])
162+
if 'body' in frame:
163+
body = json.loads(frame['body'])
164+
```
165+
Now you can for check `if body['sub_type'] == "BreakingNews"` or whatever else you plan on doing with this.
166+
167+
In Golang everything is a bit more verbose but roughly works the same (inlined and shortened for brevity).
168+
169+
```python
170+
if payload[0] == "o" {
171+
172+
frame := []struct {
173+
UUID string `json:"uuid"`
174+
Product string `json:"product"`
175+
Project string `json:"project"`
176+
...
177+
Body string `json:"body,omitempty"`
178+
}{}
179+
180+
// [1:] as we want to skip the leading character `a`
181+
err = json.Unmarshal(payload[1:], &frame)
182+
if err != nil {
183+
return
184+
}
185+
if len(frame.Body) > 1 {
186+
// here we should try to JSON unmarshal frame.Body
187+
}
188+
}
189+
190+
```
191+
`frame.Body` can now be unmarshaled in the same way as `payload[1:]` earlier.
192+
The resulting struct for it looks something like this:
193+
194+
```go
195+
type MessageBody struct {
196+
ID int `json:"id"`
197+
Title string `json:"title"`
198+
Status string `json:"status"`
199+
Version int `json:"version"`
200+
SubType string `json:"sub_type"`
201+
Label string `json:"label"`
202+
StartTime int `json:"start_time"`
203+
EndTime int `json:"end_time"`
204+
LastModified int `json:"last_modified"`
205+
Links []struct {
206+
URL string `json:"url"`
207+
ContentID string `json:"content_id"`
208+
} `json:"links"`
209+
}
210+
211+
```
212+
213+
<br>
214+
### Sweet but what do I do with this?
215+
216+
Totally up to you. Send yourself an email or txt msg using Twilio or Plivo every time something happens. For example, I wrote a little app using the Plivo API to send breaking news txts, you can subscribe by texting `news` to <a href="tel:+17185771913">+1-718-577-1913</a> if you want to give it a try (but no guarantees for how long I'll keep the service up).<br>
217+
218+
219+
### Cool, how do I run the examples?
220+
221+
Python
222+
223+
```
224+
python sockpuppet.py --ws_addr="ws://<<ADDRESS HERE>>"
225+
```
226+
227+
Go
228+
229+
```
230+
go run sockpuppet.go --ws_addr="ws://<<ADDRESS HERE>>"
231+
```
232+
233+
You can find a valid websocket host by using the Developer Console of your favorite browser and visit [nytimes.com](nytimes.com) and look for websocket connections in the network tab.
234+
235+
236+
237+
238+

Diff for: sockpuppet.go

+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
package main
2+
3+
import (
4+
"encoding/json"
5+
"flag"
6+
"fmt"
7+
"log"
8+
"math/rand"
9+
"strings"
10+
"time"
11+
12+
"code.google.com/p/go.net/websocket"
13+
)
14+
15+
var (
16+
wsAddr = flag.String("ws_addr", "", "Address of the host to connect to")
17+
)
18+
19+
type ServerMessage []struct {
20+
UUID string `json:"uuid"`
21+
Timestamp string `json:"timestamp"`
22+
Region string `json:"region"`
23+
Zone string `json:"zone"`
24+
Product string `json:"product"`
25+
Project string `json:"project"`
26+
Environment string `json:"environment"`
27+
Type string `json:"type"`
28+
Body string `json:"body,omitempty"`
29+
}
30+
31+
type MessageBody struct {
32+
ID int `json:"id"`
33+
Title string `json:"title"`
34+
Status string `json:"status"`
35+
Version int `json:"version"`
36+
SubType string `json:"sub_type"`
37+
Label string `json:"label"`
38+
StartTime int `json:"start_time"`
39+
EndTime int `json:"end_time"`
40+
LastModified int `json:"last_modified"`
41+
42+
Links []struct {
43+
URL string `json:"url"`
44+
Count int `json:"count"`
45+
ContentID string `json:"content_id"`
46+
ContentType string `json:"content_type"`
47+
Offset int `json:"offset"`
48+
} `json:"links"`
49+
}
50+
51+
func main() {
52+
rand.Seed(time.Now().Unix())
53+
flag.Parse()
54+
55+
if *wsAddr == "" {
56+
fmt.Println("Need to provide a valid host via --ws_addr=\"\"")
57+
return
58+
}
59+
60+
addr := *wsAddr
61+
switch {
62+
case strings.HasPrefix(addr, "ws://"):
63+
addr = strings.Replace(*wsAddr, ".com./", ".com.:80/", 1)
64+
case strings.HasPrefix(addr, "wss://"):
65+
addr = strings.Replace(*wsAddr, ".com./", ".com.:443/", 1)
66+
}
67+
68+
ws, err := websocket.Dial(addr, "", "http://www.nytimes.com/")
69+
if err != nil {
70+
log.Fatal(err)
71+
}
72+
log.Printf("Connected to %s", addr)
73+
74+
var msgBuf = make([]byte, 4096)
75+
for {
76+
bufLen, err := ws.Read(msgBuf)
77+
if err != nil {
78+
log.Printf("read err: %s", err)
79+
break
80+
}
81+
82+
if bufLen < 1 {
83+
continue
84+
}
85+
86+
switch msgBuf[0] {
87+
case 'o':
88+
// reply to the login request
89+
cookie := randCookie()
90+
msg := fmt.Sprintf(`["{\"action\":\"login\", \"client_app\":\"hermes.push\", \"cookies\":{\"nyt-s\":\"%s\"}}"]`, cookie)
91+
_, err := ws.Write([]byte(msg))
92+
if err != nil {
93+
log.Fatal(err)
94+
}
95+
log.Printf("Sent cookie: %s\n", cookie)
96+
97+
case 'h':
98+
// keep-alive
99+
log.Println("ping")
100+
101+
case 'a':
102+
// some JSON encoded data, let's decode it!
103+
msg, err := decodeServerMessage(msgBuf[1:bufLen])
104+
if err != nil {
105+
log.Printf("no good: %s", err)
106+
continue
107+
}
108+
109+
// response to the login message?
110+
if msg[0].Product == "core" && msg[0].Project == "standard" {
111+
log.Printf("Logged in ok")
112+
continue
113+
}
114+
115+
// possibly breaking news?
116+
if msg[0].Product == "hermes" && msg[0].Project == "push" && len(msg[0].Body) > 0 {
117+
body, err := decodeMessageBody(msg[0].Body)
118+
if err != nil {
119+
log.Printf("decoding err: %s msg: %s", err, msg)
120+
continue
121+
}
122+
log.Printf("NEW! %s %s: %s", body.SubType, body.Label, body.Title)
123+
}
124+
125+
default:
126+
log.Printf("No idea what this is: %s\n", msgBuf)
127+
}
128+
}
129+
130+
log.Printf("Exiting")
131+
}
132+
func decodeMessageBody(body string) (res MessageBody, err error) {
133+
err = json.Unmarshal([]byte(body), &res)
134+
return
135+
}
136+
137+
func decodeServerMessage(buf []byte) (res ServerMessage, err error) {
138+
err = json.Unmarshal(buf, &res)
139+
if err != nil {
140+
return
141+
}
142+
if len(res) == 0 {
143+
err = fmt.Errorf("shouldn't be empty")
144+
}
145+
return
146+
}
147+
148+
func randCookie() string {
149+
letters := []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
150+
b := make([]rune, 30)
151+
for i := range b {
152+
b[i] = letters[rand.Intn(len(letters))]
153+
}
154+
return string(b)
155+
}

0 commit comments

Comments
 (0)