Press "Enter" to skip to content

Parsing Reddit comments – Python Reddit API Wrapper (PRAW) tutorial p.2


what’s going on everybody welcome to

what’s going on everybody welcome to part two of the Python reddit API

part two of the Python reddit API

part two of the Python reddit API wrapper or prawn tutorial mini-series in

wrapper or prawn tutorial mini-series in

wrapper or prawn tutorial mini-series in this tutorial what were we talking about

this tutorial what were we talking about

this tutorial what were we talking about is at least beginning to parse comments

is at least beginning to parse comments

is at least beginning to parse comments so like I said but at the end of the

so like I said but at the end of the

so like I said but at the end of the last video comments represent a

last video comments represent a

last video comments represent a different kind of challenge for a

different kind of challenge for a

different kind of challenge for a variety of reasons mainly it’s just the

variety of reasons mainly it’s just the

variety of reasons mainly it’s just the fact that comments aren’t you know

fact that comments aren’t you know

fact that comments aren’t you know perfectly in order there it’s a tree of

perfectly in order there it’s a tree of

perfectly in order there it’s a tree of data it’s not a linear form of data so

data it’s not a linear form of data so

data it’s not a linear form of data so anyways I’m going to go ahead and remove

anyways I’m going to go ahead and remove

anyways I’m going to go ahead and remove a subreddit that subscribe but the rest

a subreddit that subscribe but the rest

a subreddit that subscribe but the rest of this stuff can remain so just

of this stuff can remain so just

of this stuff can remain so just underneath this let’s go ahead and

underneath this let’s go ahead and

underneath this let’s go ahead and continue so the first thing we could do

continue so the first thing we could do

continue so the first thing we could do is first of all I want to limit this to

is first of all I want to limit this to

is first of all I want to limit this to there’s there there are two stickies so

there’s there there are two stickies so

there’s there there are two stickies so I’m just going to limit this to three

I’m just going to limit this to three

I’m just going to limit this to three just so we don’t go you know so we just

just so we don’t go you know so we just

just so we don’t go you know so we just do one submission for now and now I’m

do one submission for now and now I’m

do one submission for now and now I’m going to come down here and we can

going to come down here and we can

going to come down here and we can reference the comments by just saying

reference the comments by just saying

reference the comments by just saying comments equals submission dot

comments equals submission dot

comments equals submission dot comments so this gives us the comments

comments so this gives us the comments

comments so this gives us the comments so now we can do is to say for comment

so now we can do is to say for comment

so now we can do is to say for comment in comments we can go ahead let’s go

in comments we can go ahead let’s go

in comments we can go ahead let’s go ahead let’s like print 20 times this but

ahead let’s like print 20 times this but

ahead let’s like print 20 times this but let give us some separation and then

let give us some separation and then

let give us some separation and then what we’re going to do is we’re going to

what we’re going to do is we’re going to

what we’re going to do is we’re going to print comment but just like a submission

print comment but just like a submission

print comment but just like a submission the comments are like these objects like

the comments are like these objects like

the comments are like these objects like the perot object and the object is just

the perot object and the object is just

the perot object and the object is just going to have the ID so then you

going to have the ID so then you

going to have the ID so then you reference an attribute and one of the

reference an attribute and one of the

reference an attribute and one of the attributes is body for the body of that

attributes is body for the body of that

attributes is body for the body of that comment and then what we’re going to say

comment and then what we’re going to say

comment and then what we’re going to say is so that’s our that’s our comment so

is so that’s our that’s our comment so

is so that’s our that’s our comment so we can at least iterate through comments

we can at least iterate through comments

we can at least iterate through comments that way so for example let’s just run

that way so for example let’s just run

that way so for example let’s just run that real quick this here your shirt

that real quick this here your shirt

that real quick this here your shirt here so these are like all our you know

here so these are like all our you know

here so these are like all our you know comments now let me pull up that what

comments now let me pull up that what

comments now let me pull up that what they’re just close out of it I guess I

they’re just close out of it I guess I

they’re just close out of it I guess I closed out of it

closed out of it

closed out of it [Music]

[Music]

[Music] pull over mine so that was why so

pull over mine so that was why so

pull over mine so that was why so there’s six comments total here but some

there’s six comments total here but some

there’s six comments total here but some of these are like replies like for

of these are like replies like for

of these are like replies like for example if you’re unfamiliar do yourself

example if you’re unfamiliar do yourself

example if you’re unfamiliar do yourself a favor and look into pandas so for

a favor and look into pandas so for

a favor and look into pandas so for example if you made me look for this

example if you made me look for this

example if you made me look for this army okay okay anyway it’s not here okay

army okay okay anyway it’s not here okay

army okay okay anyway it’s not here okay so what we have to do is iterate through

so what we have to do is iterate through

so what we have to do is iterate through it at least I’m pretty sure it’s not

it at least I’m pretty sure it’s not

it at least I’m pretty sure it’s not there so these would be just like top

there so these would be just like top

there so these would be just like top levels I’m pretty sure I just want to be

levels I’m pretty sure I just want to be

levels I’m pretty sure I just want to be a hundred percent sorry for wasting your

a hundred percent sorry for wasting your

a hundred percent sorry for wasting your time anyway so I think I closed again I

time anyway so I think I closed again I

time anyway so I think I closed again I cus I’m bad at closing things anyway I’m

cus I’m bad at closing things anyway I’m

cus I’m bad at closing things anyway I’m pretty sure it’s not there so what we

pretty sure it’s not there so what we

pretty sure it’s not there so what we need to do is get the replies so now we

need to do is get the replies so now we

need to do is get the replies so now we could say you know for reply so for or

could say you know for reply so for or

could say you know for reply so for or rather prot what we should do is we

rather prot what we should do is we

rather prot what we should do is we there might not be any replies so then

there might not be any replies so then

there might not be any replies so then we could say if lend comment dot replies

we could say if lend comment dot replies

we could say if lend comment dot replies is greater than zero and again if you

is greater than zero and again if you

is greater than zero and again if you didn’t know replies existed you could

didn’t know replies existed you could

didn’t know replies existed you could have done a Durer on comment Abadi or

have done a Durer on comment Abadi or

have done a Durer on comment Abadi or you can read the documents anyway

you can read the documents anyway

you can read the documents anyway if when comment our replies is greater

if when comment our replies is greater

if when comment our replies is greater than zero so we have some replies then

than zero so we have some replies then

than zero so we have some replies then order is a for reply in comment dot

order is a for reply in comment dot

order is a for reply in comment dot replies hmm we get loops that’s not a

replies hmm we get loops that’s not a

replies hmm we get loops that’s not a that’s a thank you anyway

that’s a thank you anyway

that’s a thank you anyway we can print then let’s just say like

we can print then let’s just say like

we can print then let’s just say like for blog that’s why and also we got body

for blog that’s why and also we got body

for blog that’s why and also we got body on that

okay so here you get a it’s just me

okay so here you get a it’s just me reply really great high-quality reply

reply really great high-quality reply

reply really great high-quality reply yeah okay so oh and here’s another reply

yeah okay so oh and here’s another reply

yeah okay so oh and here’s another reply I was like this really isn’t another one

I was like this really isn’t another one

I was like this really isn’t another one yet so this is the this is that comma I

yet so this is the this is that comma I

yet so this is the this is that comma I just searched for a second ago so there

just searched for a second ago so there

just searched for a second ago so there we caught that reply about pandas but

we caught that reply about pandas but

we caught that reply about pandas but then I think I close this let me open it

then I think I close this let me open it

then I think I close this let me open it again

again

again someone complained I wanted my videos

someone complained I wanted my videos

someone complained I wanted my videos like I just murder my Enter key it’s

like I just murder my Enter key it’s

like I just murder my Enter key it’s true uh okay if you’re there you go so

true uh okay if you’re there you go so

true uh okay if you’re there you go so so pan is looking to pandas but then

so pan is looking to pandas but then

so pan is looking to pandas but then there’s another comment underneath that

there’s another comment underneath that

there’s another comment underneath that right so then we would have to be like

right so then we would have to be like

right so then we would have to be like um you know we did we’d have to just

um you know we did we’d have to just

um you know we did we’d have to just basically okay and then at this plant

basically okay and then at this plant

basically okay and then at this plant reply we could say okay if when reply

reply we could say okay if when reply

reply we could say okay if when reply dot replies is greater than there but

dot replies is greater than there but

dot replies is greater than there but you have no idea how deep down the

you have no idea how deep down the

you have no idea how deep down the rabbit hole the comment tree things go

rabbit hole the comment tree things go

rabbit hole the comment tree things go right so that’s that’s slightly

right so that’s that’s slightly

right so that’s that’s slightly problematic then so the solution is we

problematic then so the solution is we

problematic then so the solution is we can actually say submission comments we

can actually say submission comments we

can actually say submission comments we can add dot lists to these and this will

can add dot lists to these and this will

can add dot lists to these and this will list out your all of the comments so dot

list out your all of the comments so dot

list out your all of the comments so dot list I believe is purely a Python reddit

list I believe is purely a Python reddit

list I believe is purely a Python reddit API wrapper so purely a prof. um ssin

API wrapper so purely a prof. um ssin

API wrapper so purely a prof. um ssin allottee that’s not something that’s

allottee that’s not something that’s

allottee that’s not something that’s actually available to you in the Python

actually available to you in the Python

actually available to you in the Python alright it’s not something that’s

alright it’s not something that’s

alright it’s not something that’s actually available to you even the

actually available to you even the

actually available to you even the reddit API but anyways that doesn’t

reddit API but anyways that doesn’t

reddit API but anyways that doesn’t matter

matter

matter let me go ahead and close this so we’ve

let me go ahead and close this so we’ve

let me go ahead and close this so we’ve got a nice clean thing and then also we

got a nice clean thing and then also we

got a nice clean thing and then also we uh we kind of want to do like print

uh we kind of want to do like print

uh we kind of want to do like print comment body we don’t really want to do

comment body we don’t really want to do

comment body we don’t really want to do the replies so let’s just do that to

the replies so let’s just do that to

the replies so let’s just do that to cancel this real quick

so in this case we’ve run through all of

so in this case we’ve run through all of them so here you go here’s a the

them so here you go here’s a the

them so here you go here’s a the second-level reply now unfortunately we

second-level reply now unfortunately we

second-level reply now unfortunately we have no absolutely no idea the

have no absolutely no idea the

have no absolutely no idea the contextual data for this like we don’t

contextual data for this like we don’t

contextual data for this like we don’t really know where this this was in the

really know where this this was in the

really know where this this was in the whole thing so for example you know you

whole thing so for example you know you

whole thing so for example you know you wouldn’t really know that this was in

wouldn’t really know that this was in

wouldn’t really know that this was in reply to you know which reply it was to

reply to you know which reply it was to

reply to you know which reply it was to now

now

now what list does is basically it takes all

what list does is basically it takes all

what list does is basically it takes all the top-level comments list those out

the top-level comments list those out

the top-level comments list those out then it goes down to the second level

then it goes down to the second level

then it goes down to the second level comments lists all those out then third

comments lists all those out then third

comments lists all those out then third level and so on so one option you have

level and so on so one option you have

level and so on so one option you have is rather than comment body what you

is rather than comment body what you

is rather than comment body what you could say is you can also grab like you

could say is you can also grab like you

could say is you can also grab like you could you can grab a print the parent ID

could you can grab a print the parent ID

could you can grab a print the parent ID and that would be comment dot parent now

and that would be comment dot parent now

and that would be comment dot parent now do you note that’s not an attribute

do you note that’s not an attribute

do you note that’s not an attribute that’s an actual new API call which in

that’s an actual new API call which in

that’s an actual new API call which in my opinion is super unfortunate I wish

my opinion is super unfortunate I wish

my opinion is super unfortunate I wish that was supplied and I don’t think

that was supplied and I don’t think

that was supplied and I don’t think that’s a mistake I believe that’s that’s

that’s a mistake I believe that’s that’s

that’s a mistake I believe that’s that’s just in reddit and I realize not every

just in reddit and I realize not every

just in reddit and I realize not every comment is going to necessarily have a

comment is going to necessarily have a

comment is going to necessarily have a parent but pretty much every comment

parent but pretty much every comment

parent but pretty much every comment would write like you know the parent is

would write like you know the parent is

would write like you know the parent is the actual submission or the parent is

the actual submission or the parent is

the actual submission or the parent is another comment so and these are like

another comment so and these are like

another comment so and these are like little tiny ID strings like I really

little tiny ID strings like I really

little tiny ID strings like I really think that should be included but it’s

think that should be included but it’s

think that should be included but it’s not it’s a new API call

not it’s a new API call

not it’s a new API call so anyway comment ID so comment that

so anyway comment ID so comment that

so anyway comment ID so comment that parent and rather than that this one is

parent and rather than that this one is

parent and rather than that this one is just comment ID which just is actually

just comment ID which just is actually

just comment ID which just is actually an attribute so huh crazy I can’t

an attribute so huh crazy I can’t

an attribute so huh crazy I can’t remember if a submission I’m pretty sure

remember if a submission I’m pretty sure

remember if a submission I’m pretty sure like the submission contains the

like the submission contains the

like the submission contains the subreddit ID so I love to give wrong

subreddit ID so I love to give wrong

subreddit ID so I love to give wrong though anyway that’s okay so now what we

though anyway that’s okay so now what we

though anyway that’s okay so now what we could do is get the parent ID in the

could do is get the parent ID in the

could do is get the parent ID in the comment idea of every comment and then

comment idea of every comment and then

comment idea of every comment and then what we could do is print the comment

what we could do is print the comment

what we could do is print the comment body

and then you’ve got the parent ID in the

and then you’ve got the parent ID in the comments idea of everything now from

comments idea of everything now from

comments idea of everything now from that point you could begin to do some

that point you could begin to do some

that point you could begin to do some pretty cool stuff but the first thing I

pretty cool stuff but the first thing I

pretty cool stuff but the first thing I want to show you is right let’s say

want to show you is right let’s say

want to show you is right let’s say let’s say we don’t do Python and instead

let’s say we don’t do Python and instead

let’s say we don’t do Python and instead we do news so very very popular

we do news so very very popular

we do news so very very popular subreddit and if this doesn’t work I’ll

subreddit and if this doesn’t work I’ll

subreddit and if this doesn’t work I’ll do like politics or something but we

do like politics or something but we

do like politics or something but we should hit an error here let’s go there

should hit an error here let’s go there

should hit an error here let’s go there we go here we go there’s error so if you

we go here we go there’s error so if you

we go here we go there’s error so if you use the dot list and you actually do

use the dot list and you actually do

use the dot list and you actually do iterate through all comments chances are

iterate through all comments chances are

iterate through all comments chances are eventually you’re going to wind up with

eventually you’re going to wind up with

eventually you’re going to wind up with this stupid error so more comments

this stupid error so more comments

this stupid error so more comments object has no attribute parent ok so

object has no attribute parent ok so

object has no attribute parent ok so what’s happening there is like on really

what’s happening there is like on really

what’s happening there is like on really long comment chains so like for example

long comment chains so like for example

long comment chains so like for example let me go to the news subreddit that

let me go to the news subreddit that

let me go to the news subreddit that would be this one marijuana company buys

would be this one marijuana company buys

would be this one marijuana company buys entire US time to create cannabis from

entire US time to create cannabis from

entire US time to create cannabis from the municipality that’s going to have

the municipality that’s going to have

the municipality that’s going to have lots of comments so for example right

lots of comments so for example right

lots of comments so for example right away you can see here this like load

away you can see here this like load

away you can see here this like load more comments that’s a more comments

more comments that’s a more comments

more comments that’s a more comments object and actually even though red it

object and actually even though red it

object and actually even though red it looks super simple they’re going to that

looks super simple they’re going to that

looks super simple they’re going to that till you click this I’m pretty sure

till you click this I’m pretty sure

till you click this I’m pretty sure you’re making a new call like it’s an

you’re making a new call like it’s an

you’re making a new call like it’s an actual call to their database same thing

actual call to their database same thing

actual call to their database same thing would like continue this thread that’s a

would like continue this thread that’s a

would like continue this thread that’s a new call it’s going to reload that data

new call it’s going to reload that data

new call it’s going to reload that data like all this data is not loaded on your

like all this data is not loaded on your

like all this data is not loaded on your page load that would be nuts you never

page load that would be nuts you never

page load that would be nuts you never load the page so anyways if you wanted

load the page so anyways if you wanted

load the page so anyways if you wanted to continue iterating through those

to continue iterating through those

to continue iterating through those comments you would need to also either

comments you would need to also either

comments you would need to also either handle with a you know an exception or

handle with a you know an exception or

handle with a you know an exception or something like that or one option you

something like that or one option you

something like that or one option you have is to replace the mores so for

have is to replace the mores so for

have is to replace the mores so for example coming down here comments that

example coming down here comments that

example coming down here comments that list one option you have is so you could

list one option you have is so you could

list one option you have is so you could you can just use dot replace more kind

you can just use dot replace more kind

you can just use dot replace more kind of starting to add a little too many um

of starting to add a little too many um

of starting to add a little too many um a little too many things here but let’s

a little too many things here but let’s

a little too many things here but let’s just do

just do

just do I’ll do I’ll add the dot list down here

I’ll do I’ll add the dot list down here

I’ll do I’ll add the dot list down here and then what we’ll say is dot replace

and then what we’ll say is dot replace

and then what we’ll say is dot replace underscore more and then for now we’ll

underscore more and then for now we’ll

underscore more and then for now we’ll say limit equals zero but at some point

say limit equals zero but at some point

say limit equals zero but at some point you will run into limits with the

you will run into limits with the

you will run into limits with the replace more like there’s only so many

replace more like there’s only so many

replace more like there’s only so many more it will add I think it’s 30 or

more it will add I think it’s 30 or

more it will add I think it’s 30 or something like that which is so fond of

something like that which is so fond of

something like that which is so fond of comments because like each replace more

comments because like each replace more

comments because like each replace more will load in a bunch of comments but

will load in a bunch of comments but

will load in a bunch of comments but just keep that in mind like you’re you

just keep that in mind like you’re you

just keep that in mind like you’re you you’re going to run out eventually

you’re going to run out eventually

you’re going to run out eventually but it won’t air if you do run out of

but it won’t air if you do run out of

but it won’t air if you do run out of the option to continue replacing instead

the option to continue replacing instead

the option to continue replacing instead it’s just going to toss them so you

it’s just going to toss them so you

it’s just going to toss them so you won’t hit an actual error anymore so

won’t hit an actual error anymore so

won’t hit an actual error anymore so anyways let’s let’s go ahead and run

anyways let’s let’s go ahead and run

anyways let’s let’s go ahead and run this real quick and probably I should

this real quick and probably I should

this real quick and probably I should remove the parent call that’s going to

remove the parent call that’s going to

remove the parent call that’s going to slow me down

Walt hmm let’s see submission dot

Walt hmm let’s see submission dot comments okay replace more hmm

comments okay replace more hmm

comments okay replace more hmm okay fine fine fine one one okay dot

okay fine fine fine one one okay dot

okay fine fine fine one one okay dot list and then we’ll come over here

list and then we’ll come over here

list and then we’ll come over here comments that replace more okay so first

comments that replace more okay so first

comments that replace more okay so first we we’ve converted it to list form which

we we’ve converted it to list form which

we we’ve converted it to list form which then creates this more comment object

then creates this more comment object

then creates this more comment object and now we can replace them I just did

and now we can replace them I just did

and now we can replace them I just did it backwards this should work that’s

it backwards this should work that’s

it backwards this should work that’s still going to be a lot of queries to

still going to be a lot of queries to

still going to be a lot of queries to the API but hopefully we’ll get through

the API but hopefully we’ll get through

the API but hopefully we’ll get through it are you kidding me please what have I

it are you kidding me please what have I

it are you kidding me please what have I done what have I done

done what have I done

done what have I done comments dot replace more so comment

comments dot replace more so comment

comments dot replace more so comment equals submission that comments

please

please so where is a submission comments that

so where is a submission comments that

so where is a submission comments that replace more limit equals 0 now for

replace more limit equals 0 now for

replace more limit equals 0 now for comment in comments let’s see no.4

comment in comments let’s see no.4

comment in comments let’s see no.4 comment in submission comments I really

comment in submission comments I really

comment in submission comments I really feel like I should have been able to

feel like I should have been able to

feel like I should have been able to string that someone can comment below

string that someone can comment below

string that someone can comment below what the fix should have been because I

what the fix should have been because I

what the fix should have been because I don’t see why I wasn’t able to string

don’t see why I wasn’t able to string

don’t see why I wasn’t able to string those together but obviously messing up

those together but obviously messing up

those together but obviously messing up something so for comment in submission

something so for comment in submission

something so for comment in submission comments that list let me try that drink

comments that list let me try that drink

comments that list let me try that drink some more coffee 1 Matic there we go

some more coffee 1 Matic there we go

some more coffee 1 Matic there we go not a problem that’s going forever

not a problem that’s going forever

not a problem that’s going forever though I’m going to go I’m just going to

though I’m going to go I’m just going to

though I’m going to go I’m just going to break that pencil

break that pencil

break that pencil API calls eventually it would probably

API calls eventually it would probably

API calls eventually it would probably throttle me anyway as you can see now

throttle me anyway as you can see now

throttle me anyway as you can see now we’ve got all the parent IDs the comment

we’ve got all the parent IDs the comment

we’ve got all the parent IDs the comment IDs everything’s hunky-dory we’re doing

IDs everything’s hunky-dory we’re doing

IDs everything’s hunky-dory we’re doing great so go ahead and close this out so

great so go ahead and close this out so

great so go ahead and close this out so so that’s how you can iterate through

so that’s how you can iterate through

so that’s how you can iterate through all the comments and all that now now

all the comments and all that now now

all the comments and all that now now the question is you know how might you

the question is you know how might you

the question is you know how might you rebuild that comment tree right because

rebuild that comment tree right because

rebuild that comment tree right because at some point right like you’ve got to

at some point right like you’ve got to

at some point right like you’ve got to rebuild that tree so for example one

rebuild that tree so for example one

rebuild that tree so for example one option you could have is like build a

option you could have is like build a

option you could have is like build a dictionary or something like that and

dictionary or something like that and

dictionary or something like that and then each of like the you know like the

then each of like the you know like the

then each of like the you know like the parent you’ve got a parent ID and then

parent you’ve got a parent ID and then

parent you’ve got a parent ID and then the parent content and then all the

the parent content and then all the

the parent content and then all the replies so a parent ID content all

replies so a parent ID content all

replies so a parent ID content all replies parent ID kind of all replies

replies parent ID kind of all replies

replies parent ID kind of all replies and if you did that you could rebuild

and if you did that you could rebuild

and if you did that you could rebuild the tree yourself now I’m not going to

the tree yourself now I’m not going to

the tree yourself now I’m not going to go ahead and go through all that I don’t

go ahead and go through all that I don’t

go ahead and go through all that I don’t really see too much point covering that

really see too much point covering that

really see too much point covering that in video but if you are interested in

in video but if you are interested in

in video but if you are interested in that you can go to part 2 of this

that you can go to part 2 of this

that you can go to part 2 of this tutorial series on Python programming

tutorial series on Python programming

tutorial series on Python programming and there’ll be an example there if

and there’ll be an example there if

and there’ll be an example there if you’re interested in truly rebuilding

you’re interested in truly rebuilding

you’re interested in truly rebuilding those comment trees that’s one way you

those comment trees that’s one way you

those comment trees that’s one way you could do it that’s how I would do it

could do it that’s how I would do it

could do it that’s how I would do it anyway if you have a better way I’m sure

anyway if you have a better way I’m sure

anyway if you have a better way I’m sure somebody could come up with a better way

somebody could come up with a better way

somebody could come up with a better way anyways so now in the next tutorial

anyways so now in the next tutorial

anyways so now in the next tutorial we’re going to talk about is basically

we’re going to talk about is basically

we’re going to talk about is basically just streaming from reddit so this has

just streaming from reddit so this has

just streaming from reddit so this has all been like historical grabbing from

all been like historical grabbing from

all been like historical grabbing from reddit but there’s also a way you can

reddit but there’s also a way you can

reddit but there’s also a way you can actually just stream data from reddit so

actually just stream data from reddit so

actually just stream data from reddit so anyways that’s all going to be doing in

anyways that’s all going to be doing in

anyways that’s all going to be doing in the next tutorial if you’ve got

the next tutorial if you’ve got

the next tutorial if you’ve got questions comments concerns whatever

questions comments concerns whatever

questions comments concerns whatever feel free to them below otherwise I will

feel free to them below otherwise I will

feel free to them below otherwise I will see you in the next trip

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *