pyTivo Discussion Forum Forum Index pyTivo Discussion Forum
Answers and the development of pyTivo a TiVo transcoding server
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

gmd's python metadata generator
Goto page Previous  1, 2, 3 ... 9, 10, 11 ... 18, 19, 20  Next
 
Post new topic   Reply to topic    pyTivo Discussion Forum Forum Index -> Other Apps
 View previous topic :: View next topic  
Author Message
choekstr



Joined: 06 Dec 2008
Posts: 152

PostPosted: Mon May 11, 2009 5:20 pm    Post subject: Reply with quote

oh yeah! This is the true value in it. A lot of the efforts TheAmigo has done in coding is to streamline and help it work better in cron. For instance, my cron entry used to be:
Code:
0,10,20,30,40,50 * * * * /opt/pytivo/pyTivoMetaThis.py -a -m -t -p /ftp/pub/Video/ > /dev/null 2>&1
5,15,25,35,45,55 * * * * /opt/pytivo/pyTivoMetaThis.py -a -t -p /home/hoekstra/TV\ Shows > /dev/null 2>&1

and is now just a single entry without all the options:
Code:
0,10,20,30,40,50 * * * * /opt/pytivo/pyTivoMetaThis.py -p /ftp/pub/Video/ -p /home/hoekstra/TV\ Shows > /dev/null 2>&1

I would highly recommend using this in cron as it is a great feature.
Back to top
View user's profile Send private message
TheAmigo



Joined: 14 Apr 2009
Posts: 33

PostPosted: Mon May 11, 2009 6:43 pm    Post subject: Reply with quote

choekstr wrote:
Code:
0,10,20,30,40,50 * * * * /opt/pytivo/pyTivoMetaThis.py -p /ftp/pub/Video/ -p /home/hoekstra/TV\ Shows > /dev/null 2>&1

BTW, since 0.20g you can drop the -p's too... they're silently ignored.
And if you're running a modern distro, you could replace your 0,10,20... with */10 to run every 10 minutes.

felciano wrote:
What's the protocol for handling metadata for movies split among multiple files? E.g.

Empire Of The Sun - CD1 (1987).avi
Empire Of The Sun - CD2 (1987).avi

pyTivoMetaThis will lookup each file at IMDb and create a separate metadata file for it. Since you'll want to be able to tell them apart on the TiVo, it looks for extra tags (e.g. CD1/2) and adds those to the name so you'll know which is which. So it will find tags (as choekstr mentioned) but with a caveat. The way your example is named, it thinks the name of the movie is "Empire Of The Sun - CD1". This causes two problems:
1) when looking up in IMDb, the best match is "The China Odyssey: 'Empire of the Sun', a Film by Steven Spielberg (1987) (TV)"
2) it doesn't recognize "CD1" as a tag.

Instead, if you put any tags or comments after the year: "Empire Of The Sun (1987) CD1.avi" it will work better as it knows that nothing after the year is part of the movie title:
1) the best match is now "Empire of the Sun (1987)"
2) it adds the tag "CD1" to the title as shown on the TiVo

When looking at a list of movies, there's no easy way to tell how much of the file name is the movie title. The best guess I could make is to look for a year (possibly in parens or brackets) and take everything before that as the title and everything after that as comments to be searched for tags. When it sees your file names, it thinks that CD1/2 is part of the title and there are no tags.

An alternative tagging method is to use the format that choekstr added support for in v0.20b:
Empire Of The Sun (CD1 1987).avi

bgiannes wrote:
anyone got this to run under Cron?

Even better than just running, since v0.20h (see post for details), it autodetects if it's being run from cron and chooses the best match automatically instead of prompting.

...and since all these features have been added cumulatively, getting the latest version (currently 0.20i by krkeegan) will do everything mentioned in this thread.

_________________
--
The Amigo
Back to top
View user's profile Send private message
choekstr



Joined: 06 Dec 2008
Posts: 152

PostPosted: Mon May 11, 2009 7:00 pm    Post subject: Reply with quote

The code I added back in .20b was designed to strip out any CD1/2 regardless of where it occurs in the filename so the string that got sent to IMDB was Empire Of The Sun (1987). It would capture that and add it back in later for appropriate identification in the NPL. In testing with the -d (debug) it looks like it no longer behaves this way with all the recent changes and rewrite.

I think we should add this back in as it give much more flexibility in naming. BTW, I name my files almost exactly like this but without the '-' :
Empire Of The Sun - CD1 (1987).avi
Back to top
View user's profile Send private message
TheAmigo



Joined: 14 Apr 2009
Posts: 33

PostPosted: Mon May 11, 2009 8:18 pm    Post subject: Reply with quote

The reason I did it that way was to avoid matching tags if they appeared in the title of the movie. I don't have a large collection to test against, so I don't know if there's much risk of false positives. If you want to try it, just make this change:

Code:
--- i   2009-04-29 12:19:40.000000000 -0500
+++ j   2009-05-11 15:07:05.000000000 -0500
@@ -553,10 +553,10 @@
        # Using the year when searching IMDb will help, so try to find it.
        m = re.match(r'(.*?\w+.*?)(?:([[(])|(\W))((?:19|20)\d\d)(?(2)[])]|(\3|$))(.*?)$', title)
        if m:
+               (tags, junk) = extractTags(title)
                (title, year, soup) = m.group(1,4,6)
                debug(2,"    Title: %s\n    Year: %s" % (title, year))
                title += ' (' + year + ')'
-               (tags, soup) = extractTags(soup)
        else:
                # 2nd pass at finding the year.  Look for a series of tags in parens which may include the year.
                m = re.match(r'(.*?\w+.*?)\(.*((?:19|20)\d\d)\).*\)', title)


That'll search the whole file name for tags, not just the part after the year.

If the tags we're looking for don't show up in any movie titles, then making that change in the next release would be fine.

_________________
--
The Amigo
Back to top
View user's profile Send private message
TheAmigo



Joined: 14 Apr 2009
Posts: 33

PostPosted: Wed May 13, 2009 7:37 pm    Post subject: v0.20j Reply with quote

Here's another update that adds a new feature: categorize by genre. To use it add the option: -g GenreDir

What it does: For all the movies it indexes, it will create a directory under GenreDir/ for all the genres it finds. Within each directory, it will create symlinks to all the movies in that genre. Then you can share the genre dir with pyTivo and browse your movies by genre.

There are a few caveats:
- It's unix only (should work on Mac OS X, but I don't have that for testing)
- It doesn't read existing metadata files to learn genres. So to index all your movies, you'll have to either run with -f (forces a metadata update) or delete the metadata files that you want updated.
- If you run with -g to create genre links and later delete (or move or rename) some movies, it won't know and the old (broken) links will still be there. Deleting the genredir automatically is too dangerous, so you'll have to do that by hand if you want to update all the links.
- When possible the symlinks are relative so if you move a common parent of your movies and genredir, they'll still work. This requires python 2.6+. Older versions of python will link to absolute paths and be more likely to break.
- So far, it only applies to movies, not TV shows. I archive many entire seasons, and I don't want individual links to every episode of ST: TNG in my Sci-Fi directory. I organize my shows by directory, so I'd like to have it just put a single link to the entire series in each applicable genre dir, but it's not easy to detect when TV shows are sorted into dirs by series.

My initial thought was to have subdirs so you could click though into multiple genres and you'd see only the movies that match all those genres. However, this could quickly get out of hand... if a movie had 10 genres listed, that'd be over 3 million directories, with a symlink to the movie in each one. Thus for now, it only creates one level of directories. Going a 2nd level deep wouldn't be too bad, but I have to wonder how much need there is to search for movies based on two genres... feedback welcome.

Someday, this may become moot if someone were to add the same functionality to pyTivo itself... but I'm not holding my breath.

Other changes in this version are minor:

rdian06 wrote:
You can dump -a. I added that before wgw's changed the metadata code and posted the pyTivo templates. Maybe abort if the -a option is used and print an error pointing them to wgw's thread about the templates.

-a is now ignored and prints a message with a link to the templates thread.

The patch I'd posted in the previous message is now included. That means it searches for tags anywhere in the file name, so be careful when adding new tags to the list.



pyTivoMetaThis-0.20j-amigo.py.zip
 Description:

Download
 Filename:  pyTivoMetaThis-0.20j-amigo.py.zip
 Filesize:  9.4 KB
 Downloaded:  461 Time(s)


_________________
--
The Amigo
Back to top
View user's profile Send private message
lrhorer



Joined: 04 Mar 2008
Posts: 153

PostPosted: Sat May 23, 2009 8:31 pm    Post subject: Re: updated version Reply with quote

TheAmigo wrote:
Based on choekstr's 0.20b, I've made some more changes:


This program may be just what I wanted, but the download only contains the .py file, none of the auxilliary files which are apparently necessary, no listing of the command line options, and no user guide or feature set details. Wher can one obtain "the whole enchalada", as it were?
Back to top
View user's profile Send private message Visit poster's website
choekstr



Joined: 06 Dec 2008
Posts: 152

PostPosted: Sat May 23, 2009 8:47 pm    Post subject: Reply with quote

There is nothing else needed for running pytivometathis in order to generate metadata files for your movies and TV Shows. What other files do you think are necessary and what are you trying to accomplish?

The commandline options are just a -h away:
Code:
root@quad:/opt/pytivo# ./pyTivoMetaThis.py -h
Usage: pyTivoMetaThis.py [options]

Options:
  -h, --help            show this help message and exit
  -d, --debug           Turn on debugging. More -d's increase debug level.
  -f, --force           Force overwrite of existing metadata
  -p, --path            Deprecated.  Directories may be listed without a -p,
                        default is '.'
  -t, --tidy            Save metadata to the .meta directory in video
                        directory. Compatible with tlc's patch
                        (http://pytivo.krkeegan.com/viewtopic.php?t=153)
  -m, --movie           Deprecated.  Silently ignored to prevent errors.
  -a, --alternate       Deprecated.  Use templates instead:
                        http://pytivo.krkeegan.com/pytivo-video-
                        templates-t618.html
  -i, --interactive     Deprecated.  Interactive prompts are automatically
                        supressed when run via cron or as a scheduled task.
  -r, --recursive       Generate metadata for all files in sub dirs too.
  -g GENRE, --genre=GENRE
                        Specify a directory in which to place symlinks to
                        shows, organized by genre.
Back to top
View user's profile Send private message
lrhorer



Joined: 04 Mar 2008
Posts: 153

PostPosted: Sat May 23, 2009 9:30 pm    Post subject: Reply with quote

choekstr wrote:
There is nothing else needed for running pytivometathis in order to generate metadata files for your movies and TV Shows.

That's not what the server thinks:

Code:
RAID-Server:/RAID/Server-Main/Downloads/Linux# python pyTivoMetaThis-0.20c-amigo.py
IMDB module could not be loaded. Movie Lookups will be disabled. See http://imdbpy.sourceforge.net
Warning, IMDB module not found, cannot lookup movies IMDB.


choekstr wrote:
What other files do you think are necessary

I don't know specifically. That's why I asked. Apparently some sort of IMDB module is required. Of what files this consists or what other files, if any, might be required, I haven't a clue.

choekstr wrote:
and what are you trying to accomplish?

Um, to create metadata files for my videos? I don't mean to sound obtuse, but for what other task would I likely be using the utility?

choekstr wrote:
The commandline options are just a -h away:

OK, but when I run the command, it shows less info than your list:

Code:
RAID-Server:/RAID/Server-Main/Downloads/Linux# python pyTivoMetaThis-0.20c-amigo.py -h
IMDB module could not be loaded. Movie Lookups will be disabled. See http://imdbpy.sourceforge.net
Usage: pyTivoMetaThis-0.20c-amigo.py [options]

Options:
  -h, --help            show this help message and exit
  -d, --debug           Turn on debugging. More -d's increase debug level.
  -f, --force           Force overwrite of existing metadata
  -p FILEDIR, --path=FILEDIR
                        The directory containing files to be looked up.
                        Defaults to .
  -t, --tidy            Save metadata to the .meta directory in video
                        directory. Compatible with tlc's patch
                        (http://pytivo.krkeegan.com/viewtopic.php?t=153)
  -m, --movie           Deprecated.  Silently ignored to prevent errors.
  -a, --alternate       Enable adding extended information to seriesTitle and
                        title for TV shows and to title for Movies
  -i, --interactive     If more than one match, script presents menu to choose
                        correct one
RAID-Server:/RAID/Server-Main/Downloads/Linux#


What's more, some detail is in order. For example, is the -p option recursive for child directories?
Back to top
View user's profile Send private message Visit poster's website
rdian06



Joined: 12 Apr 2008
Posts: 1420

PostPosted: Sat May 23, 2009 10:00 pm    Post subject: Reply with quote

lrhorer wrote:
OK, but when I run the command, it shows less info than your list:

Code:
RAID-Server:/RAID/Server-Main/Downloads/Linux# python pyTivoMetaThis-0.20c-amigo.py -h
IMDB module could not be loaded. Movie Lookups will be disabled. See http://imdbpy.sourceforge.net
Usage: pyTivoMetaThis-0.20c-amigo.py [options]

Options:
  -h, --help            show this help message and exit
  -d, --debug           Turn on debugging. More -d's increase debug level.
  -f, --force           Force overwrite of existing metadata
  -p FILEDIR, --path=FILEDIR
                        The directory containing files to be looked up.
                        Defaults to .
  -t, --tidy            Save metadata to the .meta directory in video
                        directory. Compatible with tlc's patch
                        (http://pytivo.krkeegan.com/viewtopic.php?t=153)
  -m, --movie           Deprecated.  Silently ignored to prevent errors.
  -a, --alternate       Enable adding extended information to seriesTitle and
                        title for TV shows and to title for Movies
  -i, --interactive     If more than one match, script presents menu to choose
                        correct one
RAID-Server:/RAID/Server-Main/Downloads/Linux#


What's more, some detail is in order. For example, is the -p option recursive for child directories?


You have a slightly older version, .20c. The latest is .20j found here:

http://pytivo.krkeegan.com/post6356.html#6356

A bunch of us contributed to the script, but TheAmigo has been doing the most work on it recently.

The script looks up TV data via thetvdb.com without any extra modules, but if you want to enable Movie data lookup, you need to install IMDbPY:

http://imdbpy.sourceforge.net/

And IMDbPY relies on an XML parser. It can use the all Python BeautifulSoup included in IMDbPY OR you can add lxml for better performance.

Also note that at least on my Mac OS X 10.4 machine, non English characters appearing in the IMDB data don't get translated to UTF-8 correctly in the metadata unless I have lxml installed. With BeautifulSoup, I get garbage characters instead of letters with accents...

And to get lxml working you need libxml, libxslt, and python-lxml:

http://imdbpy.sourceforge.net/docs/README.newparsers.txt

At least on Windows there is one lxml package that will get libxml, libxslt, and python-lxml installed for you...

On the Mac it's a bit harder...

And -p is not recursive unless you add the recursive switch (.20j).
Back to top
View user's profile Send private message
lrhorer



Joined: 04 Mar 2008
Posts: 153

PostPosted: Sat May 23, 2009 10:24 pm    Post subject: Reply with quote

Thanks!. I installed the .deb package for IMDbPY from the official Debian "Lenny" distro, and the program no longer nags. I may have to investigate the XML parse if I have problems. I'm going to download the latest version of the script right now.

Oh, and I think I rest my case concerning the need for a little documentation.
Back to top
View user's profile Send private message Visit poster's website
lrhorer



Joined: 04 Mar 2008
Posts: 153

PostPosted: Sat May 23, 2009 10:39 pm    Post subject: Reply with quote

Alright, well, at this moment there is only one program on the server without a metafile, so I ran:

Code:
python pyTivoMetaThis-0.20j-amigo.py -d -p "/RAID/Recordings/Star Trek The Next Generation/"


The program dutifully started searching the STTNG directory and properly skipped all the programs with existing metafiles, but when it came to the one without a metafile, it returned:

Code:
--->working on: Star Trek Next Generation - Evolution (Recorded Mon May 04, 2009, SCIFIHD).mpg
Searching IMDb for: Star Trek Next Generation Evolution (Recorded Mon May 04, 2009, SCIFIHD)
No matches found.


It mentions nothing about thetvdb.com. Is this failing because the episode isn't on thetvdb.com, or for some other reason?
Back to top
View user's profile Send private message Visit poster's website
choekstr



Joined: 06 Dec 2008
Posts: 152

PostPosted: Sun May 24, 2009 1:20 am    Post subject: Reply with quote

If the filename doesn't have the commonly found SNNENN (ie S01E21) naming convention then it can't determine it is a TV show so it treats it as a movie.

This is a case where the script can only do so much and you have to meet it halfway. Name you files accordingly, with some conventions that work well for automation and you should have no problem. In general as long as you have the SNNENN convention somewhere in the filename it works great.

It is the movies that are tougher to find perfectly at IMDB. IMDB is a finicky beast and if you don't match the voodoo naming properly you won't get IMDB to find it first try. Don't be afraid to open up the script and make some changes to match your movie naming convention and help it match better.
Back to top
View user's profile Send private message
lrhorer



Joined: 04 Mar 2008
Posts: 153

PostPosted: Sun May 24, 2009 3:33 am    Post subject: Reply with quote

Oh, geez, that's totally backwards. If I have to look up the season and episode number manually using something like MasterCephus' metagenerator, then the script is pretty much superfluous. I can just use the manual lookup to create the metafile in the first place. Indeed, the most tedious and time consuming part of the entire manual process is trying to find the season and episode number. The rest is trivial and takes an almost insignificant amount of time and effort.

Also, what about TV shows that aren't series? Specials, documentaries, etc?
Back to top
View user's profile Send private message Visit poster's website
rdian06



Joined: 12 Apr 2008
Posts: 1420

PostPosted: Sun May 24, 2009 4:13 am    Post subject: Reply with quote

lrhorer wrote:
Oh, geez, that's totally backwards. If I have to look up the season and episode number manually using something like MasterCephus' metagenerator, then the script is pretty much superfluous. I can just use the manual lookup to create the metafile in the first place. Indeed, the most tedious and time consuming part of the entire manual process is trying to find the season and episode number. The rest is trivial and takes an almost insignificant amount of time and effort.

Also, what about TV shows that aren't series? Specials, documentaries, etc?


First off, the script is designed to work a certain way because of the original author's workflow. That workflow happens to match my workflow so the script does wonders for me.

You have to start with some reference data if you want automation. I rip DVDs and put the season/episode number in the filename. Then the script does the rest.

If it doesn't work for you, feel free to modify it. It seems your naming convention includes the episode title so you could try keying off of that. Or you could ask TheAmigo for some help if you're not inclined to write your own code.

But calling our script "backwards" because it doesn't meet your needs won't engender a lot of support.

Personally I don't mix TV shows and movies in folders. So my last mod to the script had the TV and movie modes separated. Others decided they would rather not have to pass an extra switch to control which mode of operation it was in so they took it out.

In any case, it won't handle specials and documentaries.
Back to top
View user's profile Send private message
choekstr



Joined: 06 Dec 2008
Posts: 152

PostPosted: Sun May 24, 2009 2:33 pm    Post subject: Reply with quote

Yeah, the whole resonating tone of this thread is that this tool doesn't do what you want: "be magical and just work" and there is a lot of animosity and disdain over a tool that a few of us have graciously helped develop for your and others' potential use.
If you don't like it, don't use it. It is that simple.
If you think it is backwards, change it. You have the source code!
If you think it needs documentation (it hasn't been needed so far as others have figured it out on what is given so far) then please create some and give back.

The whole point is that this is free, open source and you aren't paying for anything so you have no rights to complain. You only have rights to change it for your liking like the rest of us have.

And you might want to look into using the season and episode naming since a good portion of the industry/scene uses this as a convention and many tools use this. Any metadata tool has to have "something" to lookup to identify the file and download the metadata. That is either going to be season and episode numbers, or episode title. We all agree the season and episode numbers are the way to go as it is succinct, absolute and way less prone to interpretation error. Automation is all about removing any chance of error.

And just to be perfectly clear, this script is NOT to identify your files for you. This script assumes your files is already identified and downloads metadata for your identified file. It populates things like episode title, directors, actors, ratings, etc into a metadata file based on criteria it knows about the file derived from the filename.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    pyTivo Discussion Forum Forum Index -> Other Apps All times are GMT
Goto page Previous  1, 2, 3 ... 9, 10, 11 ... 18, 19, 20  Next
Page 10 of 20

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
Site is in NO WAY affiliated with TiVo Inc

Powered by phpBB © 2001, 2005 phpBB Group
phpBB SEO

Get pytivo at SourceForge.net. Fast, secure and Free Open Source software downloads
[ Time: 0.1833s ][ Queries: 19 (0.0183s) ][ GZIP on - Debug on ]