An analysis of Rose Lalonde’s MEOW DNA code

My apologies to my regular readers, this is somewhat of a divergence from my usual subject matter – just click ‘back’ to get to previous posts on the top left above this article if you want to read one of my usual articles about vaccines, immunity or whooping cough. Unless you are familiar with Homestuck (a short, happy story about some friends who play a videogame together while being pestered by internet trolls) this post is unlikely to make any sense to you, and unless you are beyond Act 4 of Homestuck, contains spoilers.

The background: On this page of Homestuck we see a page of Rose’s MEOW journal, with a 952 character long excerpt in the Journalog below it. JournalogEach of these letters represent one of the four DNA nucleotide bases, apparently M=G, E=C, O=A and W=T  (which I’ll refer to as ‘MEOW = GCAT’ for short), as we discover in this part of the flash [S] Jack: Ascend:

MEOWGCAT I have not been able to find online an analysis that goes suitably in-depth into this code to satisfy me, so I went ahead and performed one myself. I looked as hard as I could for any biological relevance of the DNA sequence, as well as for a hidden message in it, not only with the key MEOW=GCAT, which [S] Jack: Ascend indicates is the correct substitution, but also with all twenty four possible nucleotide base substitutions – and all of those can be accessed/downloaded below for anyone who wishes to look for messages in there themselves. But before you do, I don’t believe there is any hidden information  in Rose’s MEOW code to be found.

I investigated a few theories of how Hussie might have come up with the string of letters that makes up the MEOW code, and I found evidence for what was possibly the least satisfying one of all… that Hussie just made up a random sequence of the letters M, E, O and W using his trademark keyboard-hitting:

AH: Hit the M, E, O and W keys nearly 1000 times to make up Rose’s Journalog

I’ve tried to generate a random, long string of DNA bases myself in the past, and quickly resorted to copying and pasting chunks of text I already typed out in order to finish the job. Hoping that Hussie might have got bored mid sequence and done the same as I had, I put the MEOW code into a repeat finder, and found it was indeed full of repeated sequences.

I took a day to identify all of the repeated sequences and colour-code them. The image below is the MEOW code, where every time the same sequence appears, it is the same colour. As you can see up until about 270 letters into it there are no repeats (each coloured sequence appears only once until then). From that point on though, the sequence is basically just a wall of colour, indicating it’s made almost exclusively of strings of letters we’ve seen before:

The MEOW code, presented as MEOW=GCAT. Each time the same sequence appears, it is labelled in the same colour. Generated using Serial Cloner v2.6.1

The MEOW code, presented as MEOW=GCAT. Each time the same sequence appears, it is labelled in the same colour. Numbers on the left correspond to the adjacent letter’s place in the sequence. Annotated using Serial Cloner v2.6.1

Over 70% of the MEOW code is strings of letters which have previously occurred in it, and strings which are so long that the likelihood of them having occurred twice by chance in a sequence of this length is less than 1/10,000.

I’m sorry to say it, but the MEOW code really does just look like it was created by whacking those four keys about three hundred times, then copying and pasting bits of what was already typed out to finish it off.

What sold it for me was when I realised the very final repeat of the brown sequence follows the letters ‘TGAT’, whereas previously it always followed the red sequence. I then realised that TGAT are the last four letters of the red sequence – indicating that someone had incompletely copied and pasted part of both sequences to make this very final addition to the end of the code:

That thing I just said

That thing I just said

So the code is exactly what you’d expect to see if someone tried to enter a random sequence, but got bored part way through and just copied and pasted chunks of what they’d already typed in to finish it off.

I’ve got to give it to Hussie though, despite over 70% of the code being repeated sequences, he actually did a pretty good job of randomising their order to make it less obvious, as you saw from the distribution of colours throughout the previous image.

What follows are my attempts to find any information in the meow code. I don’t believe there’s anything to find, but it was worth a try to see if there was anything interesting there.

So how might information be encoded in the sequence? Well the two most promising ways would be:

  1. The journal excerpt could be a chunk of a DNA sequence from a real creature. The code is used in canon to create Bec, so it would be pretty cool if it were actually part of a dog genome sequence, right?
  2. Or, a message may be written in there. Yes, written. Not in M, E O and W, or the DNA bases G, C, A and T, but in the protein encoded by the DNA sequence. Groups of three DNA bases encode individual amino acids. There are twenty amino acids in the DNA code and each is represented by a letter of the alphabet (the six letters not used in this code are B, J, O, U, X and Z). In this way it is possible to encode whole sentences in a DNA sequence (in reality those strings of amino acids are proteins, but on a computer it’s a cool way to encode some information).

So, might Hussie have included part of an actual gene in this sequence? Thankfully, the National Center for Biotechnology Information allows us to submit a query DNA sequence, and it will search through the biggest assembled database of DNA sequences (including dog genome sequences) for ones that match the query. So what do we get when we perform a search for sequences matching the MEOW = GCAT sequence using the BLAST algorithm?

The DNA sequence of MEOW=GCAT was entered into a BLASTn algorithm search. The sequence is represented above in red, and each of the matching sequences below are in blue, indicating that they have a very poor alignment score. The grey lines indicate where ‘matching’ sequences needed to be broken apart in order to align the query sequence.

Above is the graphical output of the BLASTn algorithm. Each of the matching sequences are shown in blue, indicating their poor alignment score (a score of only “40-50” according to the key; perfect matches are typically red). Each rectangle is a ‘matching’ sequence. Not only do they score poorly, but you can see each rectangle only covers a small portion of the sequence (where they are under the sequence indicates which part they align with).

Disappointingly, the DNA sequence encoded in Rose’s journal does not match any sequences from the largest accessible database. All we see are small chunks of the sequence that happen to match (with a poor score) short chunks of other sequences – exactly what you’d expect if you entered any long, meaningless, made-up DNA code.

However, on the off chance that MEOW=GCAT was not the right key, I performed BLASTn searches for all possible MEOW to G, C, A or T substitution variants. None of the reports finds significant similarity with any sequences in the database, but if you’re curious you can view each of the BLASTn reports below (note, the ‘ACGT’ link means the code I used was MEOW=ACGT, while in ‘GCAT’ I used MEOW=GCAT, etc.)

ACGT ACTG AGCT AGTC ATCG ATGC CAGT CATG CGAT CGTA CTAG CTGA GACT GATC GCAT GCTA GTAC GTCA TACG TAGC TCAG TCGA TGAC TGCA

Okay,so none of the possible MEOW/GCAT substitutions matches any archived genome, gene or RNA sequences. But what about the second information-hiding possibility I mentioned? Could a message be written in there? Well, three DNA bases encode a single amino acid, which means there are three reading frames depending on which base you start at, and three more if you choose to decode the sequence backwards. So in short, any DNA sequence could potentially encode six different strings of amino acids. Here are the six encoded by Rose’s MEOW code, where the substitution MEOW=GCAT is used. Each row is one of the six possible reading frames:

Tcl Printer Document

The DNA sequence in Rose’s Journal, where M=G, E=C, O=A and W=T. Below the sequence are the amino acids encoded in it. The top three rows of amino acids are reading frames 1, 2 and 3. The bottom three rows (reading frames -1, -2 and -3) should be read in reverse. Generated using ‘A plasmid Editor v2.0.47′

Disappointingly, I can’t make out any significant words, let alone sentences, in this code.

However, on the off chance that Hussie may have encoded something, but with a key other than MEOW = GCAT, I did the same with all 24 possible MEOW to G/C/A/T substitutions. I could not make out any messages in them, but if you want to see for yourself, each can be viewed in PDF format in these links. The name of each file indicates what key was used (eg. ‘GCAT’ uses the key MEOW = GCAT).

ACGT ACTG AGCT AGTC ATCG ATGC CAGT CATG CGAT CGTA CTAG CTGA GACT GATC GCAT GCTA GTAC GTCA TACG TAGC TCAG TCGA TGCA TGAC

In summary, I have not been able to find any real genetic information, or coded messages in Rose’s MEOW Journalog excerpt, and I believe this is because there are none there – Hussie likely just created the code randomly and finished it by copying and pasting bits of it he already typed out, as evidenced above.

Fellow Homestuck fan, I’m sorry the outcome of this analysis wasn’t more exiting. I was earnestly expecting some message from Hussie in the amino acids, or failing that, discovering it was part of a dog gene, and was a little disappointed on finding the truth, and hope this post wasn’t that much of a let-down. But hey, you wouldn’t be a Homestuck fan if you weren’t able to read walls of text with the occasional picture, only to be disappointed by it, right?

If you are interested in performing any of these analyses for yourself, or taking them further, please find the MEOW code here. Each of the tools used in the analysis are free and linked where mentioned in the post. The colour codes in the ‘repeats’ analysis have been simplified, there are four copy/paste variants of the red sequence, and two each of the blue and brown sequences, where they seem to have been copied incompletely prior to pasting. A version with multiple shades of these colours to illustrate this can be viewed here. Feel free to ask questions in the comments :o)
This entry was posted in Homestuck and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *