• Welcome to NIWA Community Forums.
 

Creating Text Dumps

Started by embyr_75, August 11, 2013, 05:05:32 PM

Previous topic - Next topic

embyr_75

Hello my fellow Wiki-people!

I'm already aware that this may be completely doomed, but I thought I'd post this anyway. There are no text dumps that I can find of the Metroid Prime series (I don't mean just Logbook data, I mean an entire dump, with scans and everything). I know nothing about making text dumps, but I assume that if it wasn't difficult to get a dump of this game it would've been done already by someone.

Anyway, since we have access to text dumps of all the Zelda games, I was wondering if anyone could give me some tips on how to get a text dump of a Prime series game, or explain why it hasn't been done yet. (P.S. please use small words  :P)

Thanks!

Torchickens

#1
Generally, to create a text dump you first need to create a table of characters (e.g. 00 = A 01 = B) that each index number gives (example). This can be achieved by hacking but you need to be able to check each character by index number. I'm not sure how you could do this in some games where the text offsets are not documented (and if they were, it would be just a matter of using a hex editor and checking an offset) but in theory you could use an emulator's cheat searcher and keep doing 'value different' searches when the character on a certain position on the screen changes. Then (in theory) you could find an address that you could keep changing to change the character and document the different index numbers.

Sometimes you may be lucky and the text may be already defined in a standard character encoding such as SHIFT-JIS. If so, just change the file extension to HTML then open it and change your character encoding to that encoding. That would show you all the text in the game that is defined that way with some garbage surrounding it. You can do it in Google Chrome by going to the icon on the top right and going to Tools/Encoding. Try opening the game in different encodings.

When you create a table of characters, you can save it as a table file (.tbl) and load it in a hex editor such as WindHex to see things in the encoding according to the table file. Note that if you want to do it with Japanese Metroid games you may have to go to Option and tick 'View data as Unicode'.

I hope this helps.
Hello. I identify as female. She/her pronouns please. :)


(credits to Moydow)

I spend a lot of my time making videos about Pokémon glitches on my Youtube channel ChickasaurusGL.

I'm the editor in chief of Starfy Wiki. Let me know or Tappy know of any bugs or other concerns. :)

Discord: Torchickens #3213
Twitter: Torchickens
Email: [email protected]

embyr_75

You definitely got me off to a good start; thanks!

I still have a lot of questions though. Since I wasn't sure exactly how to create a table of characters (I have no idea what "checking each character by index number" meant) I went ahead and tried converting the file to .html and reading it with Chrome. There was a ton of unreadable garbage in there, but plenty of English, too (and French and German. Go figure). Anyway, I'm assuming, since a lot of in-game text showed up in English, that I don't have to create a table file for it. BUT WAIT there's more.

When I change the game file to a .html file, it's still a huge file. So huge, in fact, that Chrome crashes before it can load all the text. I opened the .html file in WindHex and it handled it, but I still couldn't find any of the Scan Data, which I found very odd. I extracted all the root files I could find using Dolphin and started doing the same with those. There was even a Logbook.pak file that contains all the logbook entries. But I still can't find a file that contains all the Scan Data - or one that contains any Scan Data. Very frustrating. Anyone know what I'm doing wrong?

Torchickens

#3
Quote from: embyr_75 on August 18, 2013, 12:58:27 AM
You definitely got me off to a good start; thanks!

I still have a lot of questions though. Since I wasn't sure exactly how to create a table of characters (I have no idea what "checking each character by index number" meant) I went ahead and tried converting the file to .html and reading it with Chrome. There was a ton of unreadable garbage in there, but plenty of English, too (and French and German. Go figure). Anyway, I'm assuming, since a lot of in-game text showed up in English, that I don't have to create a table file for it. BUT WAIT there's more.

When I change the game file to a .html file, it's still a huge file. So huge, in fact, that Chrome crashes before it can load all the text. I opened the .html file in WindHex and it handled it, but I still couldn't find any of the Scan Data, which I found very odd. I extracted all the root files I could find using Dolphin and started doing the same with those. There was even a Logbook.pak file that contains all the logbook entries. But I still can't find a file that contains all the Scan Data - or one that contains any Scan Data. Very frustrating. Anyone know what I'm doing wrong?

Glad I could help!

I'm not sure what to suggest about the Scan Data as I don't have the ISO and haven't played any Metroid games. It's possible that it could be in another encoding unlike the other readable text (though I don't see why the developers would need to do this), or if the Scan Data contains text that are graphics that could be why you can't find them.

Did the text in WindHex appear the same as it did in the right Chrome encoding? If not, you can try downloading Notepad++ (I believe it won't crash for large files) and go to the 'Encoding' option and choose the right encoding from there, although actually changing the encoding and scrolling down may cause some lag.

If the Scan Data still doesn't appear, it's probably due to one of the two aforementioned reasons. In the case that they are graphics, there is a program called ConsoleTool that may be able to display them.

To elaborate about index numbers, each text character has a special code that defines it. I guess you could compare them to color codes, but pretty much every memory address (e.g. a stackable object on an items screen with 0A meaning you have ten of them) uses index numbers (see the ANSI hex section here for an example). Making a table file is just a list in the format of '(hex Number)=(character)' each line but to find what text characters use what index numbers requires hacking knowledge and it can be difficult to find them, (I can only find them in older games where there's a name entry screen and a cheat searcher, etc.) so I'm not sure what to do if the Scan Data is in a non-standard encoding, though since you found English text in a standard encoding, all of the text that aren't graphics probably are in that encoding as well.

I'm sorry if this isn't much help. I hope you're able to find the Scan Data soon.
Hello. I identify as female. She/her pronouns please. :)


(credits to Moydow)

I spend a lot of my time making videos about Pokémon glitches on my Youtube channel ChickasaurusGL.

I'm the editor in chief of Starfy Wiki. Let me know or Tappy know of any bugs or other concerns. :)

Discord: Torchickens #3213
Twitter: Torchickens
Email: [email protected]

embyr_75

Thanks so much, this is definitely going to help!! I'll mess around with it for a while, but in the meantime, thank you for the very informative replies.

I'll let you know when I hit my next roadblock.  :P

embyr_75

Update: Notepad++ still crashes when I open the entire .html file with it (it's a 1.35 GB file). I haven't tried looking for the scan data in other encodings yet, BUT I did find something interesting.

There's a site, www.metroidconstruction.com, where they specialize in disassembling Metroid games. They came up with program (mpakdump) that takes all the files out of a .pak file and breaks it down into even smaller components. I can't make it work for some reason, though. I'm hoping I'll be able to find the scan data in there, perhaps in its own little, not-cryptically-named file, if I can make this work. Anyone else have any ideas/want to try it out, since I'm failing so epically?  ;)

Toomai

Maybe you could try the free version of Hex Editor Neo: http://www.hhdsoftware.com/free-hex-editor

It has a bit of a learning curve and might be overkill for text searching, but it can handle the whole SSBM disc at once so filesize shouldn't be a problem.
oeuf

Torchickens

#7
Quote from: Toomai on August 20, 2013, 09:08:46 PM
Maybe you could try the free version of Hex Editor Neo: http://www.hhdsoftware.com/free-hex-editor

It has a bit of a learning curve and might be overkill for text searching, but it can handle the whole SSBM disc at once so filesize shouldn't be a problem.

Good idea. Note that you have to activate a 14-day trial to change the encoding (from View -> Encoding) if it isn't in 'Default ANSI'. It's just a matter of clicking a button from the message that tells you that feature is for the standard edition, though, and once the 14-day trial is up, you can still use the program, just without the features available in the full version.
Hello. I identify as female. She/her pronouns please. :)


(credits to Moydow)

I spend a lot of my time making videos about Pokémon glitches on my Youtube channel ChickasaurusGL.

I'm the editor in chief of Starfy Wiki. Let me know or Tappy know of any bugs or other concerns. :)

Discord: Torchickens #3213
Twitter: Torchickens
Email: [email protected]

embyr_75

Just to leave a follow up message on this:

Created three text dumps so far (for Metroid Prime NTSC, Metroid Prime PAL, and Metroid Prime 2: Echoes, with Corruption on the way), thanks to the help of the very smart people at the m2k2 hacking forums. For anyone who's interested, the first step was extracting the internal folders using the Dolphin Emulator, and then decoding those folders using the tools already produced by Parax and others at m2k2. The textual data was scattered about in those folders in STRG files, which they had already documented, and though there was an STRG editor available it didn't have batch functionality. So I had a someone code up just such a batch-enabled-text-dump-spitter-outer for me.

So that's the whole story. I know, the suspense was killing you. ;)


Espyo

Good stuff! With GameCube and Wii games, it's a bit easy to get text dumps from popular games because of the tools the community creates. And even then sometimes you can just open some files with a text or hex editor and find what you want.
Come to think of it, we should put a text dump on Pikipedia as well. It kind of bothers me though -- how much does this lean on the "nasty hacking stuff" side that Nintendo frowns upon?

embyr_75

Quote from: Espyo on October 10, 2015, 02:23:56 PMCome to think of it, we should put a text dump on Pikipedia as well. It kind of bothers me though -- how much does this lean on the "nasty hacking stuff" side that Nintendo frowns upon?

Well, the alternative would be manually writing down every line in the game, which I don't think they would mind.  ;)

Espyo

Yeah, I was going to do it, but I realized the best way would be to get the data from every file that has text. And for all Pikmin games... that's a LOT of files. This might take a while.