PDA

View Full Version : Download closed captions from Hulu


warrior55
02-02-2014, 05:30 AM
Hi everyone, I'm interested about downloading closed captions from Hulu. I know how to get audio/video, but for captions only this way comes to mind : record video and then use OCR software. Maybe you know the better way to get captions? Thanks for answers.

blimey
02-02-2014, 07:49 AM
As long as they are turned on in the flash player, hulu sends them as a local file.

warrior55
02-02-2014, 11:43 AM
As long as they are turned on in the flash player, hulu sends them as a local file.
Thank you for answer, but could you be more specific? I tried viewing internet activity in firefox developer tools and couldn't find anything useful. Maybe it's because Hulu uses some security (encryption or something) to prevent users from easily obtaining subtitles. Anyway, it would be great if you or someone post a short instruction for obtaining Hulu closed captions in a suitable format.

blimey
02-02-2014, 05:51 PM
Well, you're right. They are encrypted. But the encrypted file is saved locally in the cache. For example,

When viewing this content;

h..p://www.hulu.com/watch/249959


in internet explorer, 40001982_US_en_en[1].smi is saved in the ie cache. It is easy to find this file using void tools' everything.

warrior55
02-03-2014, 05:21 AM
I managed to get .smi file and it needs to be decrypted. Found some scripts for that, but they're quite old. Maybe you know a solution (script or software) that works at the moment?

denobis
03-26-2014, 02:37 PM
Well, the caps are in the sami format like this

<SYNC Encrypted="true" start="13967">9cb25a9dc6d0f588d3a29c82923fc284595f9bb77b7420a760 0655df43d32355dabf25bb721e857e67591c0e24154924</SYNC>
etc...

So you only need to decrypt each <sync> which is rijndael 128 cbc (you need the key and iv) and for the example returns "News Headquarters in New York,";

Barry
03-27-2014, 10:22 AM
I'm so sorry Denobis

It looks like non of the downloaders work anymore :(

And decrypting episodes are going above my knowledge :(

Thanks anyway for all you have done, and will do in the future :)

Stream Ripper
04-30-2014, 11:12 PM
You CAN use a screen recorder tool such as Replay Video Capture (http://applian.com/replay-video-capture/) - I use it for netflix, hulu, etc all the time WITH sub-titles - no problem

AlvoErrado2
06-10-2015, 12:23 AM
Is there any script to decrypt the hulu smi subs?

I tried to download the video and the subs using the atresdownloader, the program identifies the files, but can't download them.

stinkfoot
06-10-2015, 01:44 AM
yes being encrypting smi. yes decrypting php using kodi php scripting reposit or other php.

also choosing webvtt or ttml. hulu no encryption webvtt ttml. may downloading no having being encrypting. no needing php scripting.

AlvoErrado2
06-10-2015, 09:13 AM
yes being encrypting smi. yes decrypting php using kodi php scripting reposit or other php.

also choosing webvtt or ttml. hulu no encryption webvtt ttml. may downloading no having being encrypting. no needing php scripting.


import xbmc
import xbmcgui
import xbmcplugin
import common
import os
import binascii
import re
import math

import datetime


from BeautifulSoup import BeautifulSoup

try:
from xml.etree import ElementTree
except:
from elementtree import ElementTree

subdeckeys = [ common.xmldeckeys[0] ]

class Main:
def __init__( self ):
pass

def PlayWaitSubtitles(self, video_id):
while not xbmc.Player().isPlaying():
print 'HULU --> Not Playing'
xbmc.sleep(100)
self.SetSubtitles(video_id)

def SetSubtitles(self, video_id):
subtitles = os.path.join(common.pluginpath,'resources','cache' ,video_id+'.srt')
self.checkCaptions(video_id)
if os.path.isfile(subtitles) and xbmc.Player().isPlaying():
print "HULU --> Subtitles Enabled."
xbmc.Player().setSubtitles(subtitles)
elif xbmc.Player().isPlaying():
print "HULU --> Subtitles Disabled."
else:
print "HULU --> No Media Playing. Subtitles Not Assigned."

def checkCaptions(self, video_id):
subtitles = os.path.join(common.pluginpath,'resources','cache' ,video_id+'.srt')
if os.path.isfile(subtitles):
print "HULU --> Using Cached Subtitles"
else:
url = 'http://www.hulu.com/captions?content_id='+video_id
xml = common.getFEED(url)
tree = ElementTree.XML(xml)
hasSubs = tree.findtext('en')
if(hasSubs):
print "HULU --> Grabbing subtitles..."
subtitles = self.convert_subtitles(hasSubs)
common.SaveFile(os.path.join(common.pluginpath,'re sources','cache',video_id+'.srt'), subtitles)
print "HULU: --> Successfully converted subtitles to SRT"
else:
print "HULU --> No subtitles available."

def convert_subtitles(self, url):
xml=common.getFEED(url)
tree = ElementTree.XML(xml)
lines = tree.find('BODY').findall('SYNC')
srt_output = ''
count = 1
displaycount = 1
for line in lines:
if(line.get('Encrypted') == 'true'):
sub = self.decrypt_subs(line.text)
else:
sub = line.text
sub = self.clean_subs(sub)
if sub == '':
count += 1
continue
start = self.convert_time(int(line.get('start')))
if count < len(lines):
end = self.convert_time(int(lines[count].get('start')))
line = str(displaycount)+"\n"+start+" --> "+end+"\n"+sub+"\n\n"
srt_output += line
count += 1
displaycount += 1
return srt_output

def decrypt_subs(self, encsubs):
encdata = binascii.unhexlify(encsubs)
for key in subdeckeys[:]:
cbc = common.AES_CBC(binascii.unhexlify(key[0]))
subs = cbc.decrypt(encdata,key[1])
substart = subs.find("<P")
if (substart > -1):
i = subs.rfind("</P>")
subs = subs[substart:i+4]
return subs

def clean_subs(self, data):
br = re.compile(r'<br.*?>')
tag = re.compile(r'<.*?>')
space = re.compile(r'\s\s\s+')
sub = br.sub('\n', data)
sub = tag.sub(' ', sub)
sub = space.sub(' ', sub)
sub = sub.replace(' ',' ').strip()
if sub <> '':
sub = BeautifulSoup(sub,convertEntities=BeautifulSoup.HT ML_ENTITIES).contents[0].string.encode( "utf-8" )
sub = BeautifulSoup(sub,convertEntities=BeautifulSoup.XM L_ENTITIES).contents[0].string.encode( "utf-8" )
return sub

def convert_time(self, milliseconds):
seconds = int(float(milliseconds)/1000)
milliseconds -= (seconds*1000)
hours = seconds / 3600
seconds -= 3600*hours
minutes = seconds / 60
seconds -= 60*minutes
return "%02d:%02d:%02d,%3d" % (hours, minutes, seconds, milliseconds)


Looking at this python script, I think this is the way.

But now I will have to study to learn how to adapt this script for what I want. Eliminating the parts for the xbmc / kodi must come to some conclusion.

stinkfoot
06-10-2015, 02:27 PM
choosing and downloading webvtt or ttml from hulu then no needing decrypting. you must having smi instead of webvtt or ttml?

AlvoErrado2
06-12-2015, 08:48 AM
choosing and downloading webvtt or ttml from hulu then no needing decrypting. you must having smi instead of webvtt or ttml?

I don't know how to change this option for this show, i'm not interested in the video, only in subs, is for my english classes.

http://www.hulu.com/watch/676589

stinkfoot
06-12-2015, 03:43 PM
I don't know how to change this option for this show, i'm not interested in the video, only in subs, is for my english classes.

http://www.hulu.com/watch/676589
yes I trying learning english too!
best method being typing it.
no cheating using google translate. google translate making you lazy.

http://assets.huluim.com/captions_webvtt/333/60425333_US_en_en.vtt

AlvoErrado2
06-12-2015, 06:05 PM
Thank you very much!

Rancher
10-19-2015, 01:41 PM
Is there a way to encrypt the subtitles (.smi, .ass, .srt…)? A script or something? Thank you in advance.

biezom
10-19-2015, 01:45 PM
Is there a way to encrypt the subtitles (.smi, .ass, .srt…)? A script or something? Thank you in advance.

hi

http://stream-recorder.com/forum/hulu-subtitles-t20120.html

Rancher
10-19-2015, 02:07 PM
I should have started a new topic, sorry. Encrypting, not decrypting the subtitles. I want to protect some of the subs I have, similar to what Hulu or Crunchyroll did.